Hybrid video compression method

ABSTRACT

The invention concerns a method for compressing a digitally coded video frame sequence. In the method, a given frame is divided into blocks, and the information content of selected blocks is modified, relying on information contained in a neighboring block or blocks (prediction), and the blocks are converted from spatial representation into frequency representation. The information content of the transformed blocks is encoded by arithmetic coding. The efficiency of the coding is improved by various methods, such as dynamically partitioning the blocks into sub-blocks, or performing a compressibility analysis is the blocks before carrying out further transformations. The entropy coding uses a neural network to determine the parameters of the arithmetic coding. The frames are dynamically re-scaled, depending on available bandwidth and quality of the coded image.

The invention relates to video compression, more particularly to amethod and apparatus for compressing a video frame sequence.

Multimedia plays an ever-increasing role in everyday life. Digital videoimages (moving images) can be found almost everywhere.

The amount of information contained in digital video calls for theimprovement of the transfer bandwidth of digital video systems and forhigher-capacity storage devices. An evidence of progress in the latterfield is the rocketing increase of the capacity of Flash-typesemiconductor memory modules in recent years. Unfortunately,; theinformation content of digital video is extremely high, which makes itpractically impossible or at least very expensive to use semiconductormemory for storing video digital data.

For instance, a single minute of full-resolution (D1) digital videoneeds 1,866 Gbyte storage capacity without compression. The bandwidthmay reach 248 000 kbit/s.

In the early 1990s, the video encoding system known as MPEG1 appeared,which was capable of reducing the amount of digital video data to approx1/30 th of the original. Due to quality issues, this system wasimproved, and the video encoder known as MPEG2 was born. This is appliedprimarily in DVD and DVB systems. An improved variety of the system,MPEG4 has been designed for the purposes of Internet-oriented, so-calledstreaming media.

The object of the present invention is a high-efficiency video datacompression system. The proposed system makes it possible to store videodata in semiconductor memory modules, permitting the application oflow-cost RAM memory modules for storing video data, such RAM modulesbeing widespread on the computer market,. Such a non-mechanic storagesystem (i. e. a storage system that does not contain moving parts) canbe advantageously applied in TV sets and in so-called settop boxes usedfor satellite and cable TV applications, for intermediate storage ofbroadcast programs, and also as a replacement of conventional video taperecorders.

The coding system applying the inventive compression method can be alsoadvantageously used for replacing the mechanical tape recording systemof conventional video cameras, storing digital video data e.g. in Flashmemory.

The proposed Hybrid Video Compression coding system enables thebandwidth of the digital data stream to be decreased to 300-600 kbit/s,while preserving good video quality, which means that for two hours ofvideo only 4-500 Mbyte storage space is needed.

I. BACKGROUND ART

A number of different methods have been devised for comnpressing videoframe sequences.

During the coding process of video frames the amount of coded data ischanging dynamically within minimum and maximum limits in order tomaintain the desired quality and desired total data length.

Generally, known software programs perform compression control inaccordance with the average coded frame length calculated for a sequenceconsisting of x frames. If the average length exceeds the previousaverage length value, the compression will be stronger (the compressionratio is increased). If the average length is smaller than the previousvalue, the compression ratio is decreased within specified minimum andmaximum limit values.

The compression ratio is usually increased by selecting “stronger”quantization (a “rounding off” operation performed duringtransformations, see below). That causes higher error rate and noise.Errors are often visually conspicuous and disturbing, especially under 1Mbit/s. Since compressibility changes from frame to frame, in case of aconstant expected quality it is usually difficult to maintain thedesired average frame length.

The minimum and maximum length values usually cannot be set too high,because that would result in the control range becoming too wide and thecoded length varying over too large a scale. It is often the case thatthe specified minimum value cannot provide the desired quality so itwould be necessary to further increase the compression ratio.

I.1. The MPEG Method

One of the most widespread and known methods for compressing video datais MPEG. It can be regarded a hybrid coding method, as it unites thecompression based on spatial redundancy and the compression based ontemporal redundancy.

The method based on spatial redundancy either reduces the informationcontent of the frame by reducing details, or by recognizing andexploiting recurring features in the frame. The compression methodrelying on temporal redundancy, on the other hand, uses preceding andsubsequent frames, and encodes only these changes.

One of the known methods for still image compression is JPEG. The methodis also based on exploiting spatial redundancy. The image to becompressed is divided into blocks, and the information content of theblocks is reduced using discrete cosine transformation.

For easier comprehension of the novel features of the invention, let usbriefly review the operation of the known MPEG system. The operation ofthe system is illustrated in FIG. 1, showing the functional elementsthereof. Received blocks to be compressed are passed to selector 2through input line 1. The selector 2 decides if the given block is anintra-, inter-, or predicted block, and treats it accordingly. Theblock, having passed the DCT (discrete cosine transform) module 3 andquantization module 4, is coded in the entropy coding module 13 and iswritten out through video multiplexer 14 onto output 15, into thecompressed output data stream which is to be transmitted or stored.Transformed data of intra/inter blocks (see the explanation below), arereconstructed by inverse quantization module 5, IDCT (inverse cosinetransform) module 6 and selector 7, and these data are finally writteninto reference frame store 8. As it is explained in detail below, thequantization module 4 essentially divides the elements of the DCTtransformed block (the DCT coefficients) with the quantization factor.Coefficients are reconstructed by the inverse quantization module 5, inpractice by multiplying them with the quantization factor. In otherwords, the inverse quantization module attempts to restore the originalvalues of DCT coefficients, at least to the extent possible, allowed bythe errors arising from the integer division. This is done with theintention of immediately decoding each frame or each block within aframe. The immediate decoding is necessary because the method uses thedecoded frame as reference for coding the next frame. This proceduredoes not include entropy coding(decoding), because it would besuperfluous, considering that entropy coding does not cause informationloss which should be taken into account in the decoded reference frame.

The first frame has no reference, which means that it is always aso-called intra frame (I frame). Thus, with the first frame the aboveprocedure is repeated until the entire I-type frame is processed. Framesor blocks that use the previous or subsequent frame as reference arecalled respectively P- and B-type frames/blocks.

Blocks of the next received frame are fed into the motion estimationmodule 10 that attempts to find a reference block for the block to becoded in the reference frame stored in reference frame store 8.

The motion estimation module 10 performs motion compensation using thefound reference block, then the (original) block to be coded issubtracted from the reference block by prediction module 9, with thedifference being passed on to the modules carrying put the abovetransformations: to selector 2, DCT transformation module 3 andquantization module 4, then to entropy coding module 13, and finally tomultiplexer 14. The motion vector (MV), produced as the result of motioncompensation, is coded by the entropy coding module 12 and is passed onto the multiplexer 14, which inserts the motion vector into the outputdata stream 15.

The 7 module is a selector/sum module that leaves data of I-type blocksunchanged, while in case of P- and B-type blocks adds the foundreference block to the inverse transformed differences. The block thusreconstructed is subsequently written into the reference frame store.

Positions found during the search are converted into vectors and codedby the entropy coding module 12.

These operations will now be described in more detail.

I.2.1.

Compression systems relying on temporal redundancy encode only thechanged portions of consecutive frames. In practice, this is done bydividing the frames into blocks and comparing individual blocks pixel bypixel with pixels located in a search range of the previous or thesubsequent frame. The procedure is illustrated in FIG. 2, showing that areference block best matching a given block 20 of a frame 17 is beingsearched for in the search range 21 located in the previous frame 16 orin the search range 19 of the subsequent frame 18. The reference blockcan be located anywhere, it need not coincide with the search ranges(shown in grey) designated in the reference frames 16 or 18. It may ofcourse happen that in such cases the reference search is unsuccessful inthe given reference frame(s). Evidently, the reference frames 17, 18 arenot divided into blocks for the reference search, the blocks are shownin FIG. 2 only for the sake of better overview.

The comparison is performed using the following expression:

${{MSE}( {k,{l;u},v} )} = {\frac{1}{MN}{\sum\limits_{i = 0}^{M - 1}{\sum\limits_{j = 0}^{N - 1}( {{I_{n}( {{k + i},{l + j}} )} - {I_{n - 1}( {{k + i + u},{l + j + v}} )}} )^{2}}}}$where MSE is the so-called Mean Square Error, quantifying in essence thecomparison of individual pixels of the block to be coded and the soughtreference block, the comparison being performed pixel by pixel,

-   indices k,1 are indices determining the position of the block to be    coded,-   indices u,v are indices pointing to the search range located in the    reference frame,-   and M, N are the horizontal and vertical size of the block.

Frames located consecutively before and after the frame containing theblock to be coded are uniformly called reference frames.

During the search procedure, a search range with a typical size of−32/32 pixels is set up in the reference frame. The position of thecurrent block to be coded will be designated as the centre of the searchrange. That is, if the coding process is currently at position 10, 10then this position will be the centre of the search range in thereference frame. The range is scanned step-by-step with the currentblock and the error (the above specified MSE value) is calculated ineach step. The best reference candidate will be the position where thesearch gives the smallest error, that is, the block best matching thecurrent block to be coded. Based on the error value, it can bedetermined whether the search can be regarded successful or not. In caseof a successful search the sought reference block position is obtainedin full resolution mode. However, in most cases the search result is notsatisfactory.

If we examine the issue in more detail, it soon turns out that the causeof excessive errors (and therefore unsuccessful searches) is the errormeasurement method. For instance, in case of noisy frames even in thebest position the two blocks cannot be identical, the informationcontent of the blocks is different just because of the noise. Thissituation also arises when the displacement of a block is not the exactmultiple of the pixel size, the displacement ends somewhere between twointeger pixels, i. e. the real displacement can only be exactlyexpressed as a fraction of the pixel size.

Therefore, in order to provide a proper match between the current blockand its reference, the noise and other disturbing factors should befiltered. Usually low-pass filters are applied for that purpose. Thefilters perform the damping of high-frequency components depending onthe predetermined sampling frequency, and are thereby able to suppresspicture noise to some extent. Filters usually compensate errors byaveraging each pixel with pixels located beside or above it, or withboth. For instance, the so-called ½ pixel-resolution reference frame isproduced by inserting a pixel between every two neighbouring pixels ofthe original frame, with the average value of the two neighbouringpixels. The result is substantially the same if a new pixel is createdfrom the average values of every two neighbouring pixels, using the newpixels to create a new frame with a resolution identical with theoriginal. This frame is practically shifted ½ pixel left relative to theoriginal in case it was interpolated horizontally, and ½ pixel up incase it was interpolated vertically.

I.2.2.

For error compensation, usually bilinear filters are proposed. Bilinearfiltering involves creating three interpolated (filtered) referenceframes, namely one with vertical, one with horizontal, and one with bothvertical and horizontal interpolation. It should be remembered that areference frame is the decoded (reconstructed) variety of a coded intra(or inter) frame. This is done in order to prevent further frames, whichare based on the given reference frame, from being further deterioratedby the quantization error present in the decoded reference frame.

In the next phase the search is carried on with ½ pixel resolution (thatis, it is continued in the filtered reference frame). A selector S isgenerated from the bits of the matched x, y position using theexpression S=(y & 1)*2+x & 1 (where & stands for a logical ANDoperation). Next, a new search range is established with limits −1/+1,−1/+1, i.e. at (x+(−1/+1),y+(−1/+1)), with the position of the bestblock, found in the first search phase (the search in the non-filteredreference), set as the centre of the range.

Now the search is repeated using the new range. The selector picks therequired interpolated reference frame according to the positions andalso the particular block in the frame that is determined by thepositions, and calculates the squared error relative to the currentblock. At the end of the search the position where the error was thesmallest is retained. So, pixels of the current block will be subtractedfrom pixels of the block that was pointed to by the selector when theerror was smallest. This is illustrated in FIG. 3. A block similar to oridentical with block 24 of the frame 22 to be coded is sought in thereference frame 23. When the best matching block 25 is found, thecontents thereof are fed into the Sad/Sub (comparison) module 29 throughdata line 28, with the comparison being performed using either theunfiltered, the horizontally interpolated (filtered), the verticallyinterpolated (filtered), or the bidirectionally interpolated (filtered)reference, depending on which one has been selected by the selector withthe expression S=(y & 1) * 2+x & 1 on the basis of the position values.This procedure involves only a non-filtered reference that is locallyfiltered (interpolated). The procedure is the same in case threepreviously filtered reference frames (horizontally, vertically, andbidirectionally interpolated) are available. In that case the selectorchooses the appropriate reference frame from the above alternatives (thenon-filtered frame and the three differently filtered ones), andforwards the block located at position x, y for further comparison andprocessing.

Finally, either the squared error or, in case of the smallest-errorblock, the difference of the reference block 25 and the block 24 to becoded is passed on the output 30.

I.2.3.

Next, either the resulting differences—in case of a successfulsearch—or, if the search was unsuccessful, the current block itself areconverted with DCT transformation from spatial representation tofrequency representation. Then the unnecessary precision of data isreduced by the so-called quantization operation. This essentiallyinvolves discarding higher-order coefficients produced by the DCT, sincethese coefficients are usually small. The remaining DCT coefficients arealso either small values or zeroes, which may be efficiently coded byentropy coding, simultaneously with the position value establishedabove. This procedure is illustrated in FIGS. 4-5.

FIG. 4 shows the schematic block diagram of the coding process for anintra frame. The input frame 31 has Y, UV format where Y contains theluma information and UV contains the colour difference (chroma)information. The frame 31 comprises e.g. 8×8-pixel sized blocks. Thus,individual blocks are 8×8 matrices, with separate Y and UV blocksassociated to a given frame. In the following, if it is not indicatedotherwise, Y and UV blocks are coded similarly, so they are generallyrepresented in FIG. 4 by the matrix f(i,j). The result of the DCTtransformation performed in step 31 on a given block is the matrixdenoted F(u,v), also comprsing 8×8 elements. Modifying the individualelements of the matrix F(u,v) in step 33 we obtain the quantized matrixFq(u,v), designated in FIG. 4 with the reference numeral 37. As it hasbeen already indicated earlier, quantization essentially involves thereduction of unnecessary precision of the data, carried out in practiceby discarding certain elements of the F(u,v) matrix. Accordingly,hereafter the information contained in the original block is containedin the quantized matrix 37. The first element of the quantized matrix37, the so-called DC coefficient is reduced in step 34 with delta pulsecode modulation (DPCM). This essentially means that DC coefficients ofsubsequent blocks, having the same order of magnitude, are subtractedfrom one another, and in this manner smaller-amplitude DC coefficientsare obtained, which can be coded more efficiently by the static Huffmanentropy coding performed in step 36. The other elements of the matrices,the so-called AC coefficients, are coded applying the so-calledrun-length coding, which is based on recording only the occurrence countand the coefficient value for reoccurring coefficients (the procedure isdescribed in greater detail below). DC and AC coefficients are retrievedfrom the quantized 8×8 matrices following the so-called Zig-Zag scanorder, as illustrated at the matrix 37. Thereby the coefficients areforwarded to the entropy coder in increasing order of frequency,starting with lower-frequency coefficients, towards the higher frequencycoefficients.

I.2.4.

In case a block is not an intra, but an inter coded block, then FIG. 5shows the coding of a matched and compensated P-type block. It is soughtto find a reference block 43 for the block 42 to be coded. The block 42is located in its original position 40, between the blocks 38 of currentframe to be coded. The reference block 43 may be located in the positionindicated with the reference numeral 41. The search is performed bystepping the block 42 to be coded through the search range 39 of thereference frame. If the best match is found, the block 42 to be coded issubtracted from the reference block 43 (or the other way round) togenerate the error between the block 42 to be coded and the matchedreference block 43. In this manner, the luminance 44 and chrominance 45components of the error are obtained. These components are subjected toDCT transformation, quantization, and run-length coding in step 46, andfinally, in step 47 the run-length coded data undergo further entropycoding.

Since the original frame will not be available as a reference duringdecoding, only an already decoded frame can be used as reference. Thatis why it is necessary to reconstruct reference frames from the codeddata already during the process of coding. The simplest way to do thisis to perform the inverse quantization and inverse DCT immediately afterquantization.

In case the reference search was successful, the matched reference blockis added to the inverse transformed block, and the inverse transformedblock is written to the current position of the current frame. Becausethe frame obtained in this step will serve as a reference for the nextframe, all blocks of the current frame are updated.

If the reference search was successful, the block is classified as interblock, whereas upon an unsuccessful search the block is classified as anintra block. Block classification data are needed for the decoder,because they indicate how the block was generated during the codingprocess. Frames are classified according to the same principle.

If no reference is found for a frame (or if the frame has changedrelative to the reference to such an extent that the coding of thedifference would require substantially the same amount of data as thecoding of the original frame), the entire frame is transformed using DCTand is classified as an I-frame. If a frame uses only the precedingframe as reference, it is classified as a P-frame, while in case theframe uses both the preceding and the subsequent frames as reference, itis categorized as a B-frame.

FIG. 6 illustrates that for a B-frame, the coding system searches for areference 49 of the block C to be coded in both the preceding frame 48and the subsequent frame 50, finally keeping as reference either the onethat produced the smallest error or the linearly interpolated average ofthe two.

First the MSE value of the matched blocks P and C is computed, then MSEis also calculated for blocks F and C. In the following, the systemcalculates MSE for the block generated by the expression 0.5 * (P+F) andthe block C, and the alternative yielding the smallest MSE is finallycoded. That is, in case MSE of the block C was best relative to thepreceding frame 48, then the block P of said frame 48 becomes thereference block for the block C. In case the best result was produced bythe frame 50 following frame 49, then the block F of the frame 50becomes the reference of C, and if the best result is obtained with theaverage of F and P, then both these blocks are used as references for C.If none of the results were good enough, then the block is coded as anintra block. The block descriptor structure must always show the sourceof the reference, that is, how the block C was coded.

I.2.5.

In case of a successful reference search, the established referencepositions are converted to vectors and the vectors are coded. Thevectors specify the magnitude and the direction of the displacement ofthe block to be coded relative to the reference.

The application of DCT can be justified by the following facts:

In case a portion of coefficients is deleted (zeroed out), the inverseDCT transformation is capable of reconstructing the original block datawith a very good approximation.

The question may arise: why use DCT when it is only a variety of FFT?The answer is that there is empirical evidence that DCT gives betterfunction approximation for video encoding than FFT. This is illustratedwith some concrete values shown as examples in FIG. 7. The FFTcoefficients 52 and DCT coefficients 53 are produced by performing,respectively, FFT and DCT transformations on the input data 51. Afterquantization (that is, after discarding or truncating coefficients) thetruncated FFT coefficients 54 and truncated DCT coefficients 55 areobtained. Following the inverse transformations the IFFT reconstructeddata 56 and the IDCT reconstructed data 57 are obtained. Plotting thereconstructed data with the curves 58 and 59 it is seen that FFT is moresensitive to coefficient truncation.

I.2.6.

The purpose of quantization is the reduction of the precision of framedata (the level of the details), discarding unnecessary details.

If a given block is subjected to close examination, it can be noticedthat the block contains many details that are not perceived visually.The reason is that the sensitivity of the human eye increases towardlower spatial frequency components. Thus, if higher-frequency componentsof the frame are more strongly damped than lower-frequency components,up to a certain limit no change can be visually perceived in the decodedframe, though the compressibility of the data has increased. This kindof quantization is applied by the MPEG1-2 standards. According toanother known method, the fequency distribution of the coefficients isdisregarded, so each coefficient will be divided by the same constant(H26×-MPEG4). The most important function of quantization is thereduction of the number of bits that describe the DCT coefficients. Inother words, it is desired to describe DCT transformed coefficients withas few bits as possible. The fewer the number of bits describing acoefficient, the better the compressibility will be. However, the errorcaused by the integer division also increases when the value of thedivisor is increased.

There exist other known methods for reducing the number of bitsrepresenting DCT coefficients. For example, the so-called DC (deltacode) prediction is based on the recognition that values located at the0-th position of consecutive DCT blocks are only slightly different fromeach other. Hence, it is possible to reduce the value of the DCcomponents and also the number of bits representing these values if a DCvalue is subtracted from the preceding one. (The coefficient located atthe 0-th position of a block is called DC, while others are called AC.)

The process of AC prediction is similar to DC prediction, with thedifference that coefficients are scanned in different directions, andare averaged using various methods. A number of solutions are known forAC/DC prediction, so it is not needed to describe them in detail here.

II.

The general objective of the invention is to improve the compressionefficiency of the known method presented above, or more particularly, toprovide efficient compression with relatively low computational load.This aim has been achieved with the inventive methods described in theindependent claims attached to this description

Although the above methods can be effectively applied by themselves,their simultaneous use can result in especially significantimprovements, on the one hand because the effects of individual methodsare added, and on the other hand because the individual methods concerndifferent phases of the compression process.

Other apparatuses and software (computer program products) performingthe steps of the inventive methods, and other, substantially inversemethods carrying out the decompression of coded data are also theobjects of the present invention.

The invention is explained in detail with reference to the attacheddrawings, where

FIG. 1 shows a schematic block diagram of prior art MPEG encoding,

FIG. 2 illustrates the method of finding inter frames,

FIG. 3 illustrates the method for comparing the reference block and theblock to be coded,

FIG. 4 shows the steps of the DCT transformation, quantization, andsubsequent entropy coding,

FIG. 5 illustrates the method of subtracting the block to be coded fromthe reference block,

FIG. 6 illustrates the process of searching for reference frames amongprevious or subsequent frames,

FIG. 7 shows differences between the DCT and FFT methods,

FIG. 8 is a schematic functional diagram of the inventive hybrid videoencoder,

FIG. 9 illustrates the prediction modes of intra blocks,

FIG. 10 illustrates how the prediction mode is selected from possibleprediction modes

FIG. 11 shows an embodiment of the encoding of the block partitioning,

FIG. 12 shows possible block partitionings that use sub-blocks ofdifferent sizes,

FIG. 13 illustrates the partitioning that comprises three differentblock sizes,

FIG. 14 illustrates the partitioning comprising two different blocksizes,

FIGS. 15 a-c show search modes applicable with P and B frames,

FIG. 16 illustrates the reference search method that uses reducedsamples, showing block sizes and block patterns used during the searchprocess,

FIG. 17 shows how the interpolated reference frames used in the methodaccording to the invention are generated,

FIG. 18 a illustrates the selection (addressing) of neurons in theneural arithmetic coding unit according to the invention, and also showsthe layout of unions within the address range,

FIG. 18 b shows the schematic diagram of the neural network applicablefor an embodiment of the neural bandwidth control system according tothe invention.

FIG. 18 c shows the schematic diagram of the neural network applicablefor another embodiment of the neural bandwidth control system accordingto the invention,

FIG. 19 shows the modification of the data path of the input data in thehigh dynamic-range neural bandwidth control system,

FIG. 20 illustrates the dynamic scaling method according to theinvention,

FIG. 21 shows the signal/noise characteristics of the transmissionrealized during the inventive dynamic scaling method, compared to thecharacteristics without dynamic scaling,

FIG. 22 is the flowchart of the decoding process for video data codedwith the method according to the invention,

FIG. 23 is the schematic block diagram of an audio and video data codingsystem applying the method according to the invention,

FIG. 24 is the schematic block diagram of a system applicable fordecoding audio and video data coded according to the invention.

III.

The logical structure (schematic functional diagram) of the hybridcoding system according to the invention is shown in FIG. 8. Mainfunctional units of the system are in many ways similar to the knownMPEG coding system shown in FIG. 1. The input video data 60, in otherwords, the frames to be coded are fed into the frame scaling module 61,which, according to a number of different criteria (discussed below indetail), either reduces the size of the input frame or leaves itunchanged. The entire system is controlled by a coding control unit 62,with the exact functions thereof clarified later in the presentdescription. Frames or blocks are coded according to intra or intercoding depending on intra/inter switch 63. Blocks are directed to theoutput 73 in a transformed, quantized and coded state, having passedthrough DCT transformation module 64, quantization module 65 and entropyencoding module 72. Reference frames needed for coding inter frames aregenerated by an inverse quantization module 66 and an inverse DCT module67, from which the reconstructed reference frames are fed into the framestore 70 through a de-block filter 69. Motion compensation, i.e.production of filtered reference frames and compensated motioninformation 74 (motion vector and the subtracted block) is carried outby a module designated with the reference numeral 68 (with a resolutionthat is adjustable between ½, ¼, and ⅛ pixels). The frame store 70stores the current reference frame, with the blocks thereof beingautomatically refreshed (actualized). The module 71 performs theidentification of the changes, and finds that block partitioning whichis best suited for tracking the changes in the frame, and the module 71describes the best block partitioning using a Quad-tree structure(detailed below). The entropy encoding module 72 is a so called neuralarithmetic compressor (see below).

In the following we explain in greater detail certain aspects of theencoding (compression) process according to the invention.

The term “prediction” is used in the present description in a sense thatcovers reversible mathematical expressions which, exploiting actual orpotentially expectable similarities, are based on operations of asubstantially averaging character, and return original values with agood approximation. This means that data reconstruction can only yieldapproximated values of the original data, in other words, expectedvalues are “predicted”. For practical purposes a specific function isused to perform these calculations (that usually involve averagingoperations).

The inventive compression method has an essentially hybrid nature,because it exploits both temporal and spatial redundancy. Theimplemented compression is based on a hierarchical block structurecontaining blocks of dynamically varying sizes. The reference searchuses not only the frames immediately preceding and following the currentframe, but further preceding and subsequent frames as well, with amaximum depth of +1 and −3 frames (that is, reference search is allowedin one following and three preceding frames). High-level motioncompensation is realized by the method, with a resolution ranging from ½to ⅛ pixels. The entropy compressor performs optimized arithmeticalcoding based on multi-level prediction.

In the following, the term “intra prediction” means a reversiblemathematical expression that, depending on values in one or morereference blocks, reduces or zeroes out pixel values of the currentblock to be coded.

For the sake of clarity, it has to be pointed out that in the presentdescription references are made to two fundamentally differentprediction principles:

-   1. The so called “intra prediction”, applied for coding intra    frames, and-   2. The prediction used in neural entropy coding.

These are identified at the appropriate place.

IV. Fast Intra Prediction of the Blocks to be Coded

IV.1

Intra prediction is based on the observation that neighbouring blocks ina given frame often have similar properties, and therefore spatialcorrelations among neighbouring blocks can be used for reduction. Thus,the information content of selected blocks (the blocks to be coded) canbe modified on the basis of the information content of predeterminedpixels of a block or blocks adjacent to the selected block using theintra prediction procedure mentioned above. In a possible aspect, intraprediction is realized using the vertical lines 76 of the block locatedbefore the block to be coded or the horizontal lines 77 (that is, pixelvalues thereof) of the block located above said block, or both of them,as it is schematically depicted in FIG. 9. Let vertical line 76 becalled B, and horizontal line 77, A.

Let us consider a concrete example: Let

-   221, 224,230, 232

be elements of the last horizontal line 77 of the block located abovethe block to be coded.

Let the block to be coded be the following:

-   219, 223, 226, 232-   219, 224, 224, 231-   217, 220, 221, 229-   214, 220, 221, 228

Now we subtract the horizontal line 77 from each horizontal line of theblock to be coded. We obtain the predicted block:

-   2, 1, 4, 0-   2, 0, 6, 1,-   4, 4, 9, 3-   7, 4, 9, 4

This example uses horizontal prediction. The advantage of the predictionis the bit reduction of the pixels, and, as the tendency of valuesincreasing from left to the right that could be observed in horizontalline 77 proved to be the same in the case of the block to be coded, theentropy of the data to be coded has also improved. The distribution ofthe resulting values is: 2×“2”, 2×“1”, 2×“0”, 4×“4”, 2×“9”, 1×“7”, whichshows higher symbol occurrence rate than the original block.

The more transformation/prediction types are defined, the better is theachievable entropy by applying one of them. Thus, in one practicalembodiment of the proposed solution a modified (predicted) block iscoded instead of the original block 75 to be coded. The predicted blockis computed by subtracting line-by-line the neighbouring horizontal line77 A or vertical line 76 B from horizontal or vertical lines of theblock 75 to be coded, or by subtracting from original pixel values theaverage values of A and B, the calculation typically being performedwith the formula (pixel-(A+B)/2, and thereby obtaining the so-calledpredicted block. This is a known solution per se, but we have recognizedthat the efficiency of intra coding can be significantly improved if weallow using blocks of different sizes, even mixing them at the sametime, by which we usually obtain blocks that can be better compressed.However, the high number of calculations which must be performed rendersthis solution infeasible in itself. In the known methods, the final,best compressed predicted block is found by effectively performing theDCT transformation and entropy coding on the predicted block. Onlythereafter it is established, to what degree a given predicted block canbe compressed. However, the best compressed predicted block found in theprevious step can be used only if the error of the reconstructed block(compared to the original one) is not too high. Consequently, to measurethe error the inverse transformations must also be carried out.Altogether, very high computing capacity is required.

Thus, according to the invention, a compressibility analysis isperformed on the block to be coded before carrying out the DCTtransformation. Based on the compressibility analysis, the block iscoded with DCT and entropy coding. In most cases, however, thecompressibility analysis reveals that it is worth examining thecompressibility of the block also by dividing the block to be coded intofurther sub-blocks. In this case, the compressibility of blocksassociated to the various block partition variants is analyzed, and thatpartitioning is selected which promises the best potential results.Finally, after the block partitioning followed by the intra prediction,DCT transformation is carried out on the basis of the selected,potentially most favourable block partitioning.

IV.2.

This process is now described in more detail.

As it can be seen in FIG. 12, in the shown embodiment of the invention,the possible block sizes are 16×16, 8×8, and 4×4 pixels. The (intra)prediction of blocks with different sizes can be carried out in aplurality of ways. These are listed on the following pages.

IV.2.1 Prediction of 4×4 Blocks

Six types of prediction are defined.

1. DC Prediction

Let S0 be the prediction vector:

-   If exist A and B, then-   S0=Σ(Aj+Bj+4)/8-   or else, if only A exists-   S0=Σ(Aj+2)/4-   or else, if only B exists-   S0=Σ(Bj+2)/4-   Or else S0=128

Thus the predicted block is computed according to the formulaDCP(j,i)=IB(j,i)−S0 where j=0 . . . 3, i=0 . . . 3where IB is the block to be coded, and DCP is the predicted block.2. Horizontal PredictionDCP(j,i)=IB(j,i)−A(i) where j=0 . . . 3, i=0 . . . 33. Vertical PredictionDCP(j,i)=IB(j,i)−B(i) where j=0 . . . 3, i=0 . . . 34. Diagonal Prediction Combined with Horizontal and VerticalT(0,0)=(B(3)+2*B(2)+B(1)+2)/4T(1,0)=(B(2)+2*B(1)+B(0)+2)/4T(2,0)=(B(1)+2*B(0)+A(−1)+2)/4T(3,0)=(B(0)+2*A(−1)+A(0)+2)/4T(4,0)=(A(−1)+2*A(0)+A(1)+2)/4T(5,0)=(A(0)+2*A(1)+A(2)+2)/4T(6,0)=(A(1)+2*A(2)+A(3)+2)/4DCP(j,i)=IB(j,i)−T(j−i+3) where j=0 . . . 3, i=0 . . . 35. Diagonal with VerticalT(j,i)=A(3) where j=0 . . . 3, i=0 . . . 3T(0,0)=(A(0)+A(1))/2T(1,0)=A(1)T(0,1)=T(2,0)=(A(1)+A(2)/2T(1,1)=T(3,0)=A(2)T(0,2)=T(2,1)=(A(2)+A(3))/2DCP(j,i)=IB(j,i)−T(j,i) where j=0 . . . 3, i=0 . . . 36. Diagonal with HorizontalT(j,i)=B(3) where j=0 . . . 3, i=0 . . . 3T(0,0)=(B(0)+B(1))/2T(0,1)=B(1)T(1,0)=T(0,2)=(B(1)+B(2))/2T(1,1)=T(0,3)=B(2)T(2,0)=T(1,2)=(B(2)+B(3))/2DCP(j,i)=IB(j,i)−T(j,i) where j=0 . . . 3, i=0 . . . 3IV.2.2. Prediction of blocks with the size of 8×8 pixels can beimplemented according to similar principles.

In this case, four possible prediction types are defined:

1, DC Prediction.

If exist A and B then

-   S0=Σ(Aj+Bj+8)/16-   or else, if only A exists-   S0=Σ(Aj+4)/8-   or else, if only B exists-   S0=Σ(Bj+4)/8-   else S0=128    2. Horizontal Prediction    DCP(j,i)=IB(j,i)−A(i) where j=0 . . . 8, i=0 . . . 8    3. Vertical Prediction    DCP(j,i)=IB(j,i)−B(i) where j=0 . . . 8, i=0 . . . 8    4. Diagonal Prediction    DCP(j,i)=IB(i,i)−( A(i)+B(i) )/2 where j=0 . . . 8, i=0 . . . 8    IV.2.3. Finally, prediction of 16×16-pixel blocks is also similar:

In this case four prediction types are defined.

1. DC Prediction.

-   If exist A and B then-   S0=Σ(Aj+Bj+16)/32-   or else if only A exists-   S0=Σ(Aj+8)/16-   or else if only B exists-   S0=Σ(Bj+8)/16-   else S0=128    DCP(j,i)=IB(j,i)−S0 where j=0 . . . 15, i=0 . . . 15 (IB is the    current block, DCP is the predicted block)    2. Horizontal Prediction    DCP(j,i)=IB(j,i)−A(i) where j=0 . . . 15, i=0 . . . 15    3. Vertical Prediction    DCP(j,i) IB(j,i)−B(i) where j=0 . . . 15, i=0 . . . 15    4. The So Called “Plan Prediction”-   v=5*((Σ((A(j+7)−A(j−7))*j))/4)/4-   h=5*((Σ((B(j+7)−B(j−7))*j))/4)/4-   k=A(15)+B(15)-   T(j,i)=(k+(i−7)*h+(j−7)*v+16)/32    DCP(j,i)=IB(j,i)−T(i)where j=0 . . . 15, i=0 . . . 15    IV.2.4. So, in this aspect the proposed method uses three different    block sizes and as many as 14 prediction types. It is easy to see    that high computing capacity would be required if the known method    were applied, as all the predictions and subsequent calculations    should be carried out 4 times in 16×16-mode (that is, in case the    allowed block size is 16×16), also 4 times in 8×8-mode, and 16 times    in 4×4-mode.

In practice, this means that if the 16×16-sized block has not beendivided into sub-blocks, the P→DCT→Q→IQ→IDCT→IP transformation sequence,and the subsequent entropy coding of coefficients, and determining MSEvalues for original and inverse transformed blocks must be performed 4times. If the division of blocks into sub-blocks is allowed, accordingto the method described above, the total number of transformationsincreases to 16 (4*4) or even to 96 (6*16) (although with smaller blocksizes).

For this reason, the selection of the best predicted block is performedaccording to the flowchart presented in FIG. 10. Block data 79 passthrough multiplexer/selector 80, which, depending upon the block size,selects the current prediction mode 81 out of those above enumerated.Selector 82 b can be set by the user to direct block data 79 of thepredicted block into processing module 82 c either directly or through aHadamard transform module 82 a. The processing module 82 c produces theabsolute squared sum of the block, with comparator 83 evaluating theresulting sum. In case the value is smaller than a reference thresholdvalue, said reference threshold is overwritten by the momentary sum,with the current prediction mode being stored together with thepredicted block by processing module 84. In the following themultiplexer/selector 80 selects the mode for the next prediction, andthe whole process is repeated until all available modes—in other words,prediction modes pertaining to different potential partitionings of theblock—are tested. At the end the best predicted block, and also theprediction mode by which it was generated, is determined.

The predicted block is summed by processing module 82 c using thefollowing formula:

$\begin{matrix}{{sum}_{(i)} = {\sum\limits_{i = 0}^{M}{{abs}\mspace{11mu}( {pixel}_{(i)} )^{2}}}} & ( {{Equation}\mspace{20mu} I} )\end{matrix}$where M is the block length.

In certain cases it could prove disadvantageous that the above methoddoes not take into account spatial frequencies. That is the reason forincluding Hadamard transformation module 82 a. The transformation iscarried out before the squared sum is computed, and the user can decideto apply it or not. The definition of Hadamard transform is given below.This transformation is similar to the DCT in the sense that it generatesfrequency components/coefficients of the transformed block. In mostcases, a more efficient prediction will be the result, where thetransformed block contains fewer frequency components/coefficients.

After the best block partitioning and the corresponding prediction modehas been determined as described above, the remaining transformations(DCT . . . ) are carried out and the block is coded with the entropycoding module.

Having compared the inventive method to the known solution it turned outthat the quality/compression ratio of our method is cca. 1 dB better.Though better compression could be achieved with the known method, itwas always at the expense of quality. The inventive method, however,provides better image quality and practically the same efficiency ofcompression.

Another important feature of the inventive method is that thecomputational load thereof is approximately one-tenth that of the knownmethods, if Hadamard transformation is performed, and approximately 1/16of the computational load required by known methods if Hadamardtransform is not applied.

As the partitioning of blocks must somehow be recorded for successfiuldecoding, blocks are partitioned according to a so-called Quad-treestructure. Each block of 16×16 pixels can be conceived as a root withfour descendant leaves, which in turn can be further decomposed intofour other descendants down to the block size of 4×4. This isillustrated in FIG. 11, where in one of the 4×4-pixel sub-blocks (03) ofthe given block further 2×2-pixel sub-blocks are shown and in one ofthese 2×2-pixel blocks individual pixels are shown. The graph beside theimage of the block illustrates how individual sub-blocks, or evenindividual pixels of the block can be identified if needed. It can beclearly seen that as the resolution increases, the amount of data neededfor describing the given partitioning also increases.

This partitioning method (allowing three different block sizes) isproposed as a default for the inventive method. In anotherimplementation, however, this can be overridden by the user or thesystem itself and a mode that uses only two different block sizes can beset. This will expediently pertain to the whole frame, and cannot bemixed with the three block size-mode, preventing that one block in agiven frame is coded in two-block mode and another in three-block mode.

The intrapredictive coding in one proposed aspect of the inventionallows for 3 possible modes using dynamically changing block sizes(I-type). Thus, during the coding of an entire I-frame it is allowed toapply blocks of different sizes (applicable block sizes are listedbelow). It should be noted again that the chosen mode must be indicatedin the header structure of the frame. The 4 possible modes in thisexample are the following:

-   -   Coding using three different, dynamically changing block sizes        (16×16,8×8,4×4).    -   Coding using two dynamically changing block sizes (16×16 and        8×8).    -   Coding using two dynamically changing block sizes (16×16 and        4×4).    -   Coding using two dynamically changing block sizes (8×8 and 4×4).

In one implementation of the method, the user can only select betweentwo- and three- block modes, or, optionally the system may automaticallyselect the optimal mode. Accordingly, the user may choose from thefollowing options:

-   1. Coding using three different, dynamically changing block sizes.-   2. Coding two dynamically changing block sizes.-   3. Automatically choosing one of the above options.    -   The choice is usually determined by the available computational        capacity, or, optionally, is based on evaluating certain        characteristics of the frames to be coded.

FIG. 12 illustrates the block sizes used in the example described above.

Let us discuss now the process of intrapredictive coding withdynamically changing block sizes in more detail.

IV.3. Intrapredictive Coding Using Three Different, Dynamically ChangingBlock Sizes (I Type)

The size of the Y block is 16×16, or alternatively, four 8×8-pixelsub-blocks or sixteen 4×4-pixel sub-blocks are used.

The size of the UV is either 8×8 or, corresponding to the partitioningof the Y block, either four 4×4-pixel sub-blocks or sixteen 2×2-pixelsub-blocks are applied (see FIG. 13). It has to be noted that the blocksize of 2×2 pixels on the UV colour surface is allowed only in “inter”mode.

Because three different block sizes are applied, a method is needed forselecting the optimal size. size.

If we were to proceed according to the known method, we would have tocarry out all necessary transformations using the largest block size,and measure the error between the original and the reconstructed block.Then, in case the error exceeded a limit, the block would be dividedinto four parts, and the transformations and error comparison repeatedwith each sub-block. Those sub-blocks that had an error above a giventhreshold value would again be divided into four sub-blocks and thetransformations would be repeated again.

Though this method would be ideal, it would involve a number oftransformations that are unnecessary for producing the end result.

In case a block has to be divided into three 8×8 and four 4×4-pixelsub-blocks, one set of transformation should be carried out in bothdirections on a 16×16 block, four 8×8 transformation sets also in bothdirections, and four 4×4 transformation sets should also be performed.Of these, the transformation set carried out on the 16×16 block and theset of transformations (DCT and entropy coding-decoding) performed onone of the 8×8 blocks are redundant. Thus, precisely thosetransformation sets would have to be carried out unnecessarily that havethe highest computational load.

So, according to the invention first compressibility analysis isperformed on the blocks. In one aspect of the invention this is carriedout by dividing the block into four sub-blocks and computing theso-called variance for the block using the following formula:

$\begin{matrix}{{variance} = \frac{{\sum\limits_{j = 0}^{M}{pixel}_{j}^{2}} - ( {\sum\limits_{j = 0}^{M}{pixel}_{j}} )^{2}}{M}} & ( {{Equation}\mspace{20mu}{II}} )\end{matrix}$where M is the length of the sub-blocks, and by examining if theconditions variance≦TH8 or variance≦TH16 are fulfilled, where

-   TH8=the variance threshold allowed for 8×8 blocks-   TH16=the variance threshold allowed for 16×16 blocks.-   TH8 and TH16 are empirical constants. As the formula shows, the    “variance” value quantifies the amount of visual details in the    block.

Upon the user's choice, the Hadamard transform can be activated beforethe calculation, but the TH constants will be different from those usedfor the calculation without the Hadamard transform.

If the variance values for all sub-blocks remain within a limit of ±N %,and the sum of said variance values does not exceed the preset thresholdTH16, the block can be coded using the largest block size.

If the above condition is not fulfilled, but the variance of a sub-blockis smaller than TH8, the given sub-block can be coded in 8×8-size.Otherwise, the 8×8-pixel sub-block should be further divided into4×4-pixel sub-blocks.

The error caused by quantization should be taken into account whendetermining values for TH8/16 because, if the quantization error isrelatively low, larger block sizes can be utilized with satisfactoryresults and the subdivision of blocks may become unnecessary.

TH values may, for instance, be determined using the followingexpression:TH _((i))=Thbasevalue_((i))* errorfactor_((qp)) where i=0 . . . 1 andqp=1 . . . MAXQP

The value errorfactor(qp) is taken from a table indexed by qp that isdetermined on the basis of the quantization factor QP. Said quantizationfactor QP is provided in this case by the inventive neural controlsystem as will be described in due course, but, alternatively QP canalso be constant or be determined utilizing another known method.

The table of error factors contains values generated from combinedquantization error values, with the error factor values decreasingtoward greater indices.

In other words, higher quantization means smaller changes in TH andstricter conditions, or, put in yet another way, higher amount of visualdetails in a block and higher quantization causes the block size toconverge toward smaller block size values.

Because, as it is explained below, coding of the block partitioningitself requires relatively high amount of data, it could be expedient toexamine if it is worth to allow using three different block sizes. Incase only two different block sizes are allowed, much less additionalinformation has to be coded for recording block partitioning data.

So, in one aspect of the invention the block partitioning is analysedover the entire frame, assigning a statistical quantity to each blocksize (that is, the count of blocks with every allowed block size isdetermined). If each occurrence rates for all block sizes areapproximately the same and we are in the three-block or automatic-choicemode, the process is continued. Otherwise, if most blocks are of one oftwo dominant block sizes, these dominant sizes are determined, and themethod is carried on with steps described in the chapter entitledIntrapredictive coding using two dynamically changing block sizes(Chapter IV.4. below).

The method operates in two parts.

First, compressibility analysis is performed over the entire frame onthe basis of variance calculations, with the optimal block partitioningbeing determined as a result. In the second part, the block partitioningis carried out with those predictions that were found for the optimalpartitioning being performed, and finally the coding and the inversetransformations are carried out on each block belonging to the selectedblock partitioning, utilizing those predictions which proved to be thebest.

Although the variance analysis exactly specifies how individualsub-blocks of a given block should be partitioned, this partitioningmust be somehow recorded. The most obvious solution applies theso-called Quad-tree structure (illustrated in FIG. 11).

For the description of an entire block two variable levels are needed.The variable at the first level is called L, while second-levelvariables are designated with the letters ABCD. In case the given blockis not partitioned, L is set to 0 (L=0 if the block size is 16×16). Ifthe block is partitioned, L is set to 1. In this case four other bitsare needed for describing the sub-blocks.

If a sub-block is not partitioned further (has a size of 8×8 pixels),the value of the associated bit is 0. In case the sub-block is furtherpartitioned (into 4×4-pixel sub-blocks), the value of the associated bitis 1. For example:

L ABCD

-   0 the block is not partitioned.-   1 0000 the block is divided into four 8×8 sub-blocks-   1 0001 the first quarter of the block is divided into 4×4-pixel    sub-blocks, the size of the other blocks is 8×8-   1 0010 the second quarter of the block is divided into 4×4-pixel    sub-blocks, with the others sized 8×8

If the block is partitioned (L=1), there are 16 possible combinations;so in this case the data encoding the partitioning of the block are5-bit long together with L, while the partitioning data is only 1-bitlong (L only) if the block is not partitioned.

After block partitioning has been completed, transformations pertainingto individual blocks are carried out and the transformed blocks arecoded in the entropy coding module.

IV.4. Intrapredictive Coding Using Two Dynamically Changing Block Sizes

If the analysis decides in favour of the option that uses two blocksizes, then the two dominant block sizes have already been determined.(Possible sub-block configurations are shown in FIG. 14.) Coding withtwo block sizes is performed in essentially the same way that we havedescribed above, with only a few minor adjustments.

In case one of the dominant block sizes is 16×16 (the other size being8×8 or 4×4) then, provided the variance does not exceed the limit TH16(that is, the block flfils the condition in 16×16-size), the block willbe coded with a size of 16×16, otherwise it will be divided beforecoding into 8×8 or 4×4-pixel sub-blocks. If, however, the two dominantblock sizes are 8×8 and 4×4, the block will be coded with a size of 8×8in case the variance values of at least three sub-blocks are smallerthan TH8 (that is, the block fulfils the condition), and with a size of4×4-pixels otherwise.

The advantage of a block partitioning where only two block sizes areallowed is that the 5-bit QT code (the partitioning descriptor) can bereplaced by a single-bit code standing for the chosen partitioning(e.g., with a basic block size of 16x16, 0 may stand for a 16×16-block,1 for four 8×8-sized sub-blocks).

All the subsequent transformations are the same as those alreadydescribed. To sum up: the analysis chooses two block sizes for theframe, and these will be used for coding the entire frame.

Parts 1, 2, 3 of FIG. 14 illustrate possible sub-block combinations.

Thus, the block coding process proceeds as follows:

-   1. Dividing blocks into sub-blocks according to the block    partitioning considered the best by the compressibility analysis-   2. Determining the best predicted sub-block for each sub-block, and    coding the predicted block.    IV.5. Transformations from Spatial Representation into Frequency    Representation    IV.5.1 The discrete cosine transform (DCT) is not new in itself. The    basic principles are identical for all block sizes.    The Discrete Cosine Transform:

${y(k)} = {{c(k)}{\sum\limits_{n = 0}^{N - 1}{\cos\frac{2\pi\;{k( {{2n} + 1} )}}{4N}{x(n)}}}}$where N is the number of elements in the given block, c(0)=1/√N andc(k)=√(2/N), 1≦k≦N-1, with x(n) being the n-th element of the block tobe coded.Inverse CDT Transformation

${x(n)} = {\sum\limits_{k = 0}^{N - 1}{\cos\frac{2\pi\;{k( {{2n} + 1} )}}{4N}{c(k)}{y(k)}}}$

These transformations can be implemented as a factorized matrix-vectorproduct, which significantly reduces the amount of calculations.

Currently implemented methods are realized with integer-based 2Dtransformations. As there exist several well-documented methods forperforming DCT transformations on a computer, there is no need toaddress them here.

IV.5.2 Hadamard transform:

${X(b)} = {( \frac{1}{2} )^{n/2}*{\sum\limits_{a = 0}^{N - 1}{{X(a)}( {- 1} )^{\sum\limits_{k = 0}^{n - 1}{{a{(k)}}{b{(k)}}}}}}}$where a ↔ a(n − 1)…  a(1)a(0) b ↔ b(n − 1)…  b(1)b(0) a(k), b(k) = 0, 1

Similarly to the discrete cosine transform, the Hadamard transform is avariant of FFT, with the great advantage that it comprises only additionand subtraction in matrix form. Thus, it can be performed much faster ona computer than DCT or FFT. It also has an important disadvantage,namely that the function is not continuous. That is why the Hadamardtransform causes a visually more conspicuous error with higher-detailblocks than DCT. This makes it suitable to be directly applied only on“flat” (lower-detail) 16×16 blocks. As the 16×16-pixel block sizerequires the highest amount of calculation, it is preferable to applyHadamard transform on 16×16 blocks whenever they need to be transformed.It should be noted here that in a specific embodiment the varianceanalysis performed according to the invention only allows theapplication of the 16×16 block size, if the given block has sufficientlylow detail levels.

IV.6 The step performed between the DCT transformation of blocks orsub-blocks and entropy coding is the so-called quantization, duringwhich matrix elements of the DCT-transformed block are modifiedaccording to specific guidelines in order to provide for easier codingwith the entropy coding module.

The method applied according to the invention is similar to the standardMPEG quantization, and is performed using the following formula:

${qcoeff}_{(f)} = {( {\frac{( {{data}_{(j)}*16} ) + ( {{matrix}_{(j)}*0.5} )}{{matrix}_{(j)}}*( {\frac{2^{17}}{{QP}*2} + 1} )} )/2^{17}}$where qcoeff(j) is the j-th element of the matrix corresponding to theDCT-transformed block after quantization,

-   data(j) is the j-th element of the matrix corresponding to the a    DCT-transformed block prior to quantization,-   matrix(j) is the j-th element of the quantization matrix,-   and QP is the quantization factor (a scalar value).    The inverse quantization:

${data}_{(j)} = \frac{( {{{qcoeff}_{(j)}*2} + 1} )*{matrix}_{(j)}*{QP}}{16}$The quantization matrix matrix(j) has the same size as theDCT-transformed block, or the original block itself (e.g. 16×16, 8×8,etc.)

The solution provided by the inventive method differs from the knownMPEG quantization method in that it chooses the quantization matrixmatrix_((j)) depending on the quantization factor. Known MPEGquantization uses a single matrix_((j)) quantization matrix.)

The correspondence between the quantization matrix and the QPquantization factor is implemented by dividing the entire quantizationdomain into N subdomains with a previously specified bandwidth rangebeing assigned to each subdomain. In an embodiment of the invention QPwas between 1 and 36, the interval being divided into multiple (in oneaspect, four) subdomains according to the following:(1-4),(4-8),(8-20),(20-36). Bandwidth ranges assigned to thesesubdomains were: (6000-1000 kBit/s, 800-1000 kBit/s, 400-800 kBit/s,100-400 kBit/s)

That means that an optimised 4×3 matrix table (corresponding to thethree different block sizes) was assigned to the divided QP domain, withindividual elements of the table being entire matrix_((j)) quantizationmatrices.

As the size of matrix_((j)) is the same as the block size(4×4,8×8,16×16), separate matrices are assigned to each of these blocksizes and that way each row of the table comprises three cells (in otherwords, three matrices are assigned to each subdomain).

Thus, in case the method modifies QP, the optimal table row (optimalmatrices) that corresponds to the given bandwidth, is assigned to thenew quantization factor.

IV.6. Coding the Quantized Coefficients

In a concrete realization of the inventive compression system, threedifferent methods have been implemented for coding the coefficients.Though the basic principles of these methods are known, they are brieflyexplained below for the sake of clarity.

IV.6.1. Coding Method Based on the Differences of DC Values of theDiscrete Cosine Transform

As it turns out from the name, this method comprises the steps ofsubtracting DC values of consecutive blocks from each other and codingthe resulting differences by an arithmetic coding method (the principlesof arithmetic coding are well-known and are detailed later in thisdocument).X _(dpcm) =X _((i)) −X _((l−)1)

The method is also called delta pulse code modulation (DPCM), and isbased on the observation that the difference between consecutive DCvalues is usually very small, so the difference can be coded with fewerbits than the values themselves. Because the inventive method utilizesmultiple block sizes, it is important to note that only the DC values ofblocks of the same size can be expediently subtracted from each other,as the block size determines the magnitude of DC coefficients.

The arithmetic method codes each block size with dedicated parameters(subdivision of the coding interval, upper/lower limits of codinginterval, etc.).

IV.6.2 Run-Length Coding of AC Values and Arithmetic Coding of theResulting Compressed Values

AC coefficients are retrieved by means of the so-called “zig-zag” table(see FIG. 4, 37) and are compressed by the known run-length method.

The run-length method generates ordered pairs (coefficient: occurrencecount), where the latter value specifies the number of occurrences ofthe given coefficient in the data sequence. Because the total number ofpairs in a given block cannot be foreseen, either the number of pairsshould be specified or an end-of-block code should be inserted after thelast pair.

If the coefficient value of the last pair is zero, said last pair neednot be coded—it is sufficient to write the end-of-block code into thesequence. The coefficient-occurrence count pairs are written in reverseorder (occurrence count:coefficient) into the data sequence to be coded.The reason is that in this manner the zero value of the occurrence countparameter can be used as an end-of-block code (without, of course, thecorresponding coefficient) because, if all pairs are valid ones, nocombination can occur where the occurrence count is zero, so the codecan be safely applied as end-of-block code.

If the block contained only zeroes prior to coding, only theend-;of-block code is coded. In this case the given block will be zeroedout (filled with zeroes) before decoding during block reconstruction.Pairs produced by the run-length method are finally also coded witharithmetic coding.

IV.6.3. Arithmetic Coding of AC Values Using Conventional Parameters andParameters Predicted by Means of a Neural Network

In this method, AC coefficients are directly coded with the arithmeticmethod, without intermediate run-length coding. In one of the methodsimplemented in the coder according to the invention, arithmetic codingis performed such that the level at which the current coefficient iscoded is determined by the value of the previously coded coefficientthrough a modulo function (this is in many ways similar to conventionalarithmetic coding without a neural network). This modulo functionusually ensures only that in case more than one identical ACcoefficients are beside each another, the coefficients are coded at thesame level. The operating principles of the neural arithmetic codingmodule are explained in detail later in this document. Suffice it tomention here that in the inventive neural arithmetic coder there are noparameters (interval limits, interval subdivision) which would depend onthe AC values. No end-of-block code is used, instead, the position ofthe last non-zero AC coefficient in the transformed and quantized blockis recorded by putting out the actual position plus one. This is neededbecause if all AC coefficients would be zero in a block, it is possibleto indicate, by writing out a zero, that no further data associated tothe given block are coded in the output data stream (in other words,that the block contains only zeroes).

For example:

-   positions 0 1 2 3 4 5 6 7-   coefficients: 63 11 21 32 0 000-   output: 3 63 11 21 32    where 3 indicates the last active (non-zero) position. This    arrangement, however, is incapable of indicating the situation where    all data are zeroes, because if a 1 is found at position 0, we    should still code a 0 for the last active (non-zero) position. So,    the position value is increased by one except when all data are    zeroes.

So the output data sequence will be the following: 4 63 11 21 32, or, ifeverything is zero, only 0.

As an alternative to the above method, it could be conceived that a zerois coded for each coefficient at a predetermined level, and a one if thelast non-zero value has been reached.

For instance:

-   Level 0: 0001 where the “1” indicates the last non-zero data-   Level 1: 63 11 21 32)

In the coding module implemented according to the invention one of thetwo above described methods are applied for preparing AC data forentropy coding. However, without departing from the scope of theinvention, other transformations can also be utilized.

IV.6.4. During entropy coding both method 2 (see Chapter IV.6.2) andmethod 3 (see Chapter IV.6.3) are executed in test mode, and finallythat method is applied which resulted in a shorter data sequence. Ofcourse, the chosen method must be identified somehow in the coded datastream.

Method 1 (for coding DC coefficients) is always carried out, with codeddata being output into the output data stream and subsequently beingcoded with the neural entropy coding module according to the invention.Of course, other known entropy coding methods can also be utilized.

For the sake of clarity, let us consider the format of the output datastream (before it would enter the arithmetic coder) for a singlesub-block, utilizing methods 2 and 3 (Chapters IV.6.2 and IV.6.2,respectively)

-   2. |0|PRED|DPCM|AC|EOB or if everything is zero then    |0|PRED|DPCM|EOB-   3. |1|PRFD|DPCM|POS|AC or if everything is zero then |PRFD|DPCM|EOB    where    the starting bit identifies the coding mode-   PRED prediction type-   DPCM DC coefficient coded by delta coding-   AC one or more AC coefficients-   POS position of AC coefficient-   EOB end-of-block character.    Format for coding the whole 16×16 block:-   |QT=0000|[0|PRED|DPCM|AC|EOB], [0|PRED|DPCM|AC|EOB],    -   [0|PRED|DPCM|AC|EOB],[0|PRED|DPCM|AC|EOB]        where QT (quad tree) are data describing block partitioning in        case the block has been divided into four sub-blocks and each        sub-block applies type-2 coding        Or:-   |QT=0001|-   [0|PRED|DPCM|AC|EOB], |[0|PRED|DPCM|AC|EOB] (2×4×4)-   [0|PRED|DPCM|AC|EOB], |[0|PRED|DPCM|AC|EOB] (2×4×4)-   [0|PRED|DPCM|AC|EOB], (1×8×8)-   [0|PRED|DPCM|AC|EOB],[0|PRED|DPCM|AC|EOB] (2×8×8)    meaning that the 16×16 block contains three 8×8 sub-blocks that are    not subdivided, and the fourth 8×8 block (coming in fact first) has    been subdivided into four 4×4 sub-blocks.    IV.7. Intrapredictive Coding of the UV Colour Surfaces

During the known method of MPEG coding, pixel data of a given frame areconverted from RGB to the so-called YUV2 format. Here Y stands for thelightness (luma) signal, while UV is the colour difference (chroma)signal.

The physical size of UV surfaces is scaled back to half relative to Y(though this causes data loss, this loss has proved to be acceptable anddoes not lead to a significant decrease in quality).

Thus, to each 16×16 Y block, one 8×8 U and one 8×8 V block is assigned(this is illustrated in FIG. 12).

In this manner, when partitioning U and V blocks into sub-blocks, onlythe 8×8 and 4×4 sizes are desirable for use (as 2×2-pixel sub-blocks donot compress better than 4×4 or 8×8 ones, they are not worth using). Theanalysis of the block partitioning is done practically the same way aswhat has already been described, with the important difference that herecomparison is performed only with the TH8 variance threshold. It has tobe remembered that TH8 is an empirical value, with which the variance iscompared, the latter computed in a way similar to Equation II. If thevariance of the tested block satisfies condition TH8≧ variance then theblock is coded with 8×8-size, otherwise with a size of 4×4 pixels.

For the prediction of U and V blocks only DC prediction is utilized (itis empirically shown that the gain from using other prediction modeswould be marginal).

Other transformations are the same as described above. The onlydifference is in the quantization step size (order of quantization).

The output format of the data block (prior to arithmetic coding) is thefollowing:

-   |0|M|DPCM|AC|EOB or if everything is zero then |0|M|DPCM|EOB-   |1|M|DPCM|POS|AC or if everything is zero then |1|M|DPCM|EOB    where M is the bit indicating block partitioning. E.g. M=0 if the    block is not partitioned, M=1 if it is partitioned.

Let us now turn to the coding of so-called “inter” blocks, where areference block is searched for the block to be coded in another frame.

V. Interpredictive Coding Using Dynamically Changing Block Sizes

V.1 As it has already been pointed out, coding of inter frames is basedon temporal redundancy. This means that the current frame is comparedwith the previous or the subsequent frame, and only the differencesbetween the two frames are coded. Reference search modes of the methodare illustrated in FIG. 15. The following alternatives are possible:Searching only in the three preceding P-type frames (FIG. 15 a);searching only in the two preceding B-type frames (FIG. 15 b); searchingin preceding and subsequent motion compensated references (FIG. 15 c, inthis case B-type frames usually cannot be used as reference).

The fundamentals of the method are identical to known methods: a searchis performed for the block to be coded in a search range specifiedwithin the reference frame, then the redundancy (the difference block)is coded together with the position of the reference block (or moreexactly, the motion vector), and where the difference block is computedby subtracting individual pixels of the block to be coded fromrespective pixels of the block located at the position determined by thesearch.

Of the possible search methods the best results would be yielded by theso-called “full search”. Such a search would, however, have enormouscomputational load, since it would involve the comparison of the 16×16block with data in the search range starting from each pixel of therange, when searching for a matching block in all of the possiblelocations within the search range. Practically, with a search range of32×32 pixels this would mean 1024*256=262144 additions and subtractionsand operations for determining the absolute value of the differences,only for finding a matching 16×16 reference block. Because a frame of720×576 pixels contains 1620 blocks (with 16×16 pixel size), the overallnumber of calculations would exceed 424 million. That is why forpractical purposes the so-called “fast search” methods are usuallyapplied instead of full search. “Fast search” methods apply few testpoints (typically less than 64), and have the disadvantage of beingsuccessful only if the displacements are small (typically smaller than1-4 pixels), that is, motions in the video footage are slow.

In case of greater displacements the probability of a successful searchdecreases rapidly. Another disadvantage of fast search methods is that,even if the search appears successful, it cannot be made sure that thefound position is the minimum point (the position of the optimalreference block) within the search range.

The search method implemented in our invention can practically beregarded as a “full-search”, which in case of slow motions has onlyslightly higher computational load than genuine full-search methods.Even in case of faster motions (greater displacements), thecomputational load of the inventive method is only a fraction of theload required by the standard “full-search” method.

V.2. The search method implemented according to the invention is basedon the so-called “spiral search”, which is carried out in practice asfollows.

V.2.1 A search range is specified in the reference frame. Coordinates ofthe centre of the search range are identical to coordinates of thecentre of the search sample (the block to be coded).

In the next step the search sample, that is, the block 88 to be coded isdivided into smaller sub-blocks (see FIG. 16). In one embodiment of theinvention, good results have been produced using 4×2 sub-blocks. Now,variance values are computed for each sub-block, using an expressionsimilar to Equation II. If there are at least two sub-blocks that have avariance greater than the variance values of all the other sub-blocks,the sum of their variance is greater than a predefined empiricalconstant THV22, and the two sub-blocks are not located adjacently (e.g.sub-blocks 88 b and 88 c), then only these sub-blocks of the 16×16 block88 are tested during the search.

In case the condition THV22≦variance₁+variance₂ is not fulfilled (wherevariance₁ and variance₂ are the variance values of the twobiggest-variance non-adjacent sub-blocks), then the above operation isrepeated with the four biggest-variance non-adjacent sub-blocks, ofwhich the combined variance is compared with another constant THV24(this is illustrated by sub-blocks 89 a, 89 b, 89 c, 89 d of block 89).

-   If the condition for the variance of the four sub-blocks is still    not true, sub-blocks are merged into 4×4 sub-blocks and the above    operations are repeated with two sub-blocks with constant THV42,    and, if necessary, with four sub-blocks with constant THV44 (see    sub-blocks 90 a, 90 b of block 90), attempting to find in the latter    case the four biggest-variance non-adjacent sub-blocks.

If the respective condition is not fulfilled in any of the abovesituations, the reference search is performed using five 4×4 sub-blockslocated in the four corners and the centre of the 16×16 block (FIG. 16).

The constants THV specify the minimum variance that the combinedvariance of the sub-blocks should equal or exceed in each of the abovesituations.

The process detailed above can be intuitively summed up as attempting tofind the most detailed sub-blocks in the 16×16 block, supposing that incase they have matching references in the search range, other sub-blocksthat are less rich in detail will also have their appropriatereferences.

The centre of the spiral is specified in the reference frame at aposition conforming to the position of the search sample, and thenblocks around the specified point are tested in widening cycles with thereduced search sample, described above. The search range is scanned withthe relative position of sub-blocks of the sample pattern kept constant.

V.2.2.

The test is performed according to the following formula:

${MSE} = {\sum\limits_{j = 0}^{M}{{abs}\mspace{11mu}( {A_{(j)} - B_{(j)}} )^{2}}}$where M is the block length.

MSE is the so called Mean Square Error. Every time the MSE value isfound to be smaller than the current smallest calculated MSE, the newMSE value, together with the current position, is written into atemporary buffer, e.g. in a 64-element circular buffer. If the buffer isfull, data are overwritten starting from the first element.

The search method is preferably fine-tuned by analyzing the obtained MSEvalue from other aspects as well. For instance, in case the search findsmultiple positions that satisfy the conditions for a match (e.g. MSE issmaller than the maximum allowed error), and these positions are locatedin the same direction (on the same side) seen from the starting point,and further the error increases as the search is leaving the region ofthese positions, then the search can be aborted because it is highlyprobable that the search moves away from the optimum point.

At the end of the search, after every point has been tested or thesearch has been aborted, the circular buffer contains those positionsthat are probably the best candidates for a reference position. (Of the1024 possible positions only a smaller number are stored, e.g. accordingto the above example, 64 positions.) Stored positions are then testedagain but this time with the whole 16×16 block, and the positionyielding the smallest error is determined.

V.2.3. Now, the SAD₁₆ values (Sum Absolute Difference, the index refersto the block size) computed from the coded block and the newlyestablished reference block are compared with an empirical constantMAXSAD16. SAD is computed using the following formula:

${SAD} = {\sum\limits_{j = 0}^{M}{{abs}\mspace{11mu}( {A_{(j)} - B_{(j)}} )}}$where M is the block length, A and B stand for elements of the referenceblock and the block to be coded.

The constant MAXSAD16 specifies the maximum acceptable error of thereference matched to a 16×16 block. In case the error of the block foundby the search is not acceptable, the reference search is repeated in theframe preceding or following the current frame (depending on frametype). If none of the reference frames give acceptable results, the onethat yielded the smallest error is chosen. Now the block is partitionedinto four 4×4 sub-blocks and SAD₈ values are computed and compared withthe (empirical) constant MAXSAD8 for each sub-block.

Thereafter, positions contained in buffers associated to the referenceframes are tested and the reference frame and position is selected wherethe most 8×8 sub-blocks yielded acceptable results (fulfilled theSAD₈≦MAXSAD8 condition).

For each sub-block with excessive error the search is repeated in areduced search range using a block size of 8×8, starting from theposition of the given sub-block. If the error still exceeds the limit,the sub-block found the best by the 8×8-search is subdivided into 4×3sub-blocks, and the search is repeated on sub-blocks satisfying thecondition SAD₄>MAXSAD4.

If all the sub-blocks of a particular block had to be subdivided into4×4 sub-blocks, then for this particular block the reference search canbe repeated in all reference frames, in case it is unsuccessful in thecurrent reference frame.

In case the errors of individual sub-blocks are still found excessiveafter the search has ended, blocks where the search was unsuccessful aremarked. These are to be coded as intra blocks in subsequent steps ofthemethod.

Searching in all reference frames means that, if the search stops with asatisfactory result in e.g. the second frame, then it is not continuedin the third frame.

V.3. Processing Blocks in 16×16 Partition

If the reference search is unsuccessful in the current frame in case ofa 16×16 block (no reference block is found), and the current frame isP-type, the search is repeated in the preceding P-type reference frames,with a maximum depth of 3 frames. If the search is successful, thenumber of the reference frame is recorded, and the search is finished.Otherwise, the method selects the frame with the smallest error, dividesthe block into four 8×8 sub-blocks, and continues the search with thosesub-blocks where the error exceeds the limit. If the current frame isB-type, search is first performed in the following P frame, then in thethree preceding P frames. If, in case of a B-type frame the search isunsuccessfil in all reference frames, an averaged reference block isproduced from the following frame and one of the preceding frames byinterpolating the positions of the best reference block-candidates foundin the following and in one of the preceding frames using the simpleexpression applied in the MPEG method. If the square error between theproduced interpolated block and the original block should remain toolarge, the reference frame where the error was the smallest is selected,and the block is divided into four 8×8 sub-blocks.

V.4 Processing Blocks in 8×8 Partition

The processing of 8×8 blocks is almost identical with the processing of16×16 blocks, with the difference that if the search is stillunsuccessful at the end of processing, blocks are subdivided intosub-blocks of 4×4 pixels.

V.5 Processing Blocks in 4×4 Partition

The processing of 4×4 blocks is almost identical with the processing of8×8 blocks, the sole difference being that in case the search is stillunsuccessful, the erroneous block is marked an intra block.

V.6. Extended Search with ½, ¼ and ⅛-Pixel Resolution

After the processing of the block has ended with full pixel resolution,the search is carried on with ½, ¼ and ⅛-pixel resolution (search infiltered/interpolated reference). The sole exception is theinterpolation mode of B-type frames where the ½, ¼ and ⅛-pixelresolution search must be performed before the interpolation. Becausethe search with ½, ¼ and ⅛-pixel resolution is not essentially differentfrom known methods which have been described earlier in this document,the process is not explained here.

In the implemented method, a maximum of three reference frames areassigned to each frame. These are different only in their size and thefiltering method by which they were created (see FIG. 17).

Each search process uses an associated reference frame. The search with½ and ¼-pixel resolution uses the 1:4 ratio interpolated referenceframe, while the ⅛-pixel resolution search utilizes the 1:8 ratiointerpolated reference frame. The full-pixel search uses the 1:1 ratiointerpolated reference frame. Because the application of interpolatedreference frames and filters is known per se, it is not necessary todetail it here.

Similarly to previous situations, a spiral search is applied but herethe number of test points is under 20 due to the significantly reducedsearch range. Usually, the search can only be aborted if SAD=0 (completematch between the current block and the reference block). The search canbe performed with each block size, but only without block reduction(that is, only whole blocks and not sub-blocks are compared), whichmeans that testing a 16×16 block involves 256 subtraction operations.

After the blocks to be coded have been subtracted from matched referenceblocks, the resulting reduced blocks are transformed from spatialrepresentation into frequency representation.

V.7.

After the process of finding the reference block for the entire 16×16block and partitioning the reference block has been successfullycompleted, the block to be coded is also divided into equal sizedsub-blocks, and the sub-blocks are subtracted from sub-blocks of thereference block. The differences are then subjected to discrete cosinetransformation and quantization, and AC/DC prediction is applied on thecoefficients contained in the resulting blocks. The prediction method,similarly to the prediction process of intra blocks, generates thedifferences between DC values of transformed blocks, and applieshorizontal/vertical prediction on the first horizontal/vertical lines ofAC coefficients. The difference between this method and the previous oneis that the intra prediction described above subtracts the actual pixelvalues of neighbouring pixels from pixel values of the block to becoded, whereas in this case elements of the neighbouring DCT-transformedblocks are subtracted from elements of a DCT-transformed block. Thismethod further improves coding efficiency of transformed blocks duringentropy coding.

Decision about which prediction mode (horizontal or vertical) should beapplied is made on the basis of the differences between DC coefficientsof the blocks surrounding the transformed block, where the relativepositioning of blocks may be the following:

-   CB-   AX    where X stands for the transformed block that is being processed,    while A, B, and C are the surrounding blocks.

The prediction mode (pred) is selected by the following conditionalexpression:pred=abs(A-C)<abs(C-B)where A, B and C are DC coefficients of blocks surrounding the blockthat is being processed.

If the condition is true, vertical prediction will be applied, otherwisethe horizontal mode is selected. Because the correlations determiningthe prediction mode are available when block reconstruction isperformed, it is not necessary to record the selected mode in theoutput.

V.8.

In this section a possible data format and coding method is presentedfor data describing block partitioning.

The description of the block partitioning of the block to be coded issimilar to what was presented in the section dealing with intra frames.

A maximum 2-bit long value, L describes the partitioning of a 16×16block. Bit 1 is 0 if the block is not divided, and 1 if it is divided.Bit 2 has the value 1 if the block is labelled intra. This latterhappens when each of the 8×8 sub-block are subdivided into 4×4sub-blocks and more than 50% of these sub-blocks have been labelled asintra blocks. In that case the system will code the given blockaccording to the process for coding intra blocks.

The data prior to entropy coding are the following:

RFIDX is the frame number of the applied reference frame

-   MV is the motion vector-   DC is the first coefficient of the discrete cosine transform and-   AC designates the other DCT coefficients,-   IP indicating the interpolation mode for B frames-   I stands for intra mode-   P is the number of intra mode prediction.    Modes for P Frames are the Following:-   Mode 0: |L|RFIDX|MV|DC|AC-   Modes 1 . . . 15: |L|QT|RFIDX|I|MV|DC|AC . . . I|MV|DC|AC    Modes for B Frames are the Following:-   Mode0: |L|RFIDX|IP|MV|DC|AC-   Modes 1 . . . 15: |L|QT|RFIDX|I|IP|MV|DC|AC . . . I|IP|MV|DC|AC

If I=1, the description of the given block changes to: |I|P|DC|AC.

Let us see two concrete examples for P-type frames:

1 | 0001 | 2 | 1 | 0001 | 0 | 0 MV4 DC4 AC4 1 P DC4 AC4 0 MV4 DC4 AC4 0MV4 DC4 AC4 0 MV4 DC4 AC4 0 MV4 DC4 AC4 0 MV4 DC4 AC4 0 MV4 DC4 AC4 0MV8 DC8 AC8 0 MV8 DC8 AC8 0 MV8 DC8 AC8 0 MV8 DC8 AC8 0 MV8 DC8 AC8 0MV8 DC8 AC8 four 4 × 4 blocks and four 4 × 4 blocks, the first three 8 ×8 blocks of which being an intra block three 8 × 8 blocks. (Also in thiscase, the variables “AC” usually represent more than one data elements.)

Having coded the Y colour surface of the 16×16 block, the next step iscoding the UV colour surfaces. The size of the block is in proportionwith the size of the Y block and the sub-blocks thereof.

In case the Y block had the size of 16×16 pixels, then the UV block issized 8×8 pixels, if the Y block is sized 8×8 then the UV block is sized4×4 and finally, to a 4×4 Y block the corresponding UV blocks are sized2×2.

Otherwise, the partitioning of U and V blocks is identical to thepartitioning of the Y block. The method of partitioning of the UV blocksis performed in the same manner as the partitioning of the Y blocks.Thus during the coding process only the DC and AC values must besequentially written out before entropy coding, because all the otherdata (block partitioning descriptors, block identifiers, etc.) arealready described in the Y block.

V.9. Prediction of Macroblocks (Motion Vectors and Inter Blocks)

For the optimal coding of motion vectors (that is, vectors indicatingmotion relative to the reference block) the motion vectors arepreferably represented in the shortest possible form. This can beprovided by subtracting from the current motion vector the motion vectorbelonging to the block located beside, above, or diagonally above thecurrent block. Of the three possible neighbouring motion vectors thatone is selected which is the closest to the mean value, or in otherwords, is bigger than the smallest and smaller than the biggest.

V.10. Finally, data are coded using the entropy coding module. We willnow turn our attention to the description of the inventive binarymulti-level arithmetic coding module.

VI. The Neural Arithmetic Coding Module

The entropy coding module according to the invention utilizes binaryadaptive technique. This means that input data are processed bit by bit,in the finction of the occurrence frequency and the pattern of bitsalready received.

VI.1. The primary design aim for the method was achieving the bestpossible compression with relatively simple calculations. The inventivemethod can be implemented without using divisions, performing only twomultiplications and a few additions and logical operations. All otheroperations are based on integers.

The inventive method is a so-called arithmetic coding method. Arithmeticcoding is a method known per se. The basic principle of arithmeticcoding involves the modification of the upper and lower limits of aninterval (range) depending on received data of the data sequence to becoded. Arithmetic coder is truly efficient only if the distribution ofincoming data is known to some extent, in other words, if it theprobability estimate for the value of the next input data element isknown to some extent.

The following short algorithm performs arithmetic coding on a binarydata stream (containing only 0 and 1 symbols). The current data to becoded (the next bit of the data stream) is stored in the “bit” variable.The upper and lower limits of the coding interval are the variables“area[0]” and “area[1]”.

${{area}\lbrack {{inverz}({bit})} \rbrack} = {{{area}\lbrack 0\rbrack} + \frac{( {{{area}\lbrack 1\rbrack} - {{area}\lbrack 0\rbrack}} )*{prob}}{2^{16}} + {bit}}$if((area[1] − area[0]) < 256) {${outdata} = \frac{{area}\lbrack 1\rbrack}{2^{24}}$ area[0] = area[0] *2⁸ area[1] = area[1] * 2⁸ } prob = calc_next_probe(bit)

The key factor in increasing the efficiency of coding is how the valueof the “prob” variable (in the following: probe value or probe) isdetermined. The probe value is returned in the above example by the“calc_next_probe(bit)” function.

The array “area[0,1]” contains two 32-bit values, namely the “area[0]”and “area[1]” variables that store the upper and lower limits of thecoding interval. As it is known from the theory of arithmetic coding,the interval (area[0], area[1]) is scaled with the probe value. In knownmethods the probe value is usually determined as a function of thefrequency of occurrence of previously arrived bits. Depending on thevalue of the newly received bit, either the lower or the upper limit ofthe coding interval is modified with the scaled value. The interval canbe modified (new bits can be coded) until the difference of the upperand lower limits becomes smaller than 256. In principle, other valuescan also be used, but for treating the overflow, 256 appeared to be themost practical value. Accordingly, when the difference of the upper andlower limits becomes less than 256, the 8 most significant bits arewritten out to the output data sequence, and variables representing boththe lower and upper limits are shifted to the left by 8 places.

This can be written in mathematical form as:area[1]=area[1]*256area[0]=area[0]*256VI.2. The probe value is determined as follows:

We introduce the concept of frequency table, and define two functionsfor describing the occurrence rate of 0-s and 1-s in the incoming datasequence, f(0) and f(1). Received bits are fed into a buffer with thelength of m bits, where: 2^(m)=N, so the base-2 logarithm of N providesthe window width.

Bits arriving sequentially into the buffer constitute the “window”variable (shifting left the contents of the buffer as needed). The“window” variable is truncated to a length of m bits to form an index(stored in the “index” variable) that points to one of the rows in thefrequency table. In the example presented here the frequency table has512 rows. Elements of the frequency table are specified by variablesFreqTb1[index]f0 and FreqTb1[index]f1. These variables show how manytimes the received bit has been 0 or 1 when the bit combination was thesame as the bit combination currently stored in the buffer (in otherwords, the current bit combination is considered as an index pointing toa given row of the table).

count = count + 1 if(count >= log₂ (N + 1)){ count = 0 window = 0 }window = window * 2 + bit FreqTbl[index].f0 = FreqTbl[index].f0 + (2 *bit) FreqTbl[index].f1 = FreqTbl[index].f1 + (2 − (2 * bit)) index =window mod N sum = FreqTbl[index].f0 + FreqTbl[index].f1${prob} = \frac{{{{FreqTbl}\lbrack{index}\rbrack}.f}\; 0*{{FracTbl}\lbrack{sum}\rbrack}}{2^{10}}$if(sum > 256){${{{{FreqTbl}\lbrack{index}\rbrack}.f}\; 0} = \frac{{{{FreqTbl}\lbrack{index}\rbrack}.f}\; 0}{2}$${{{{FreqTbl}\lbrack{index}\rbrack}.f}\; 1} = {\frac{{{{FreqTbl}\lbrack{index}\rbrack}.f}\; 1}{2} + 1}$}

At the beginning of the compression, all variables (except N, of whichthe base 2 logarithm gives the window length) are filled with zero. Theincoming bit overwrites the least significant bit of the buffer(window), and that element of the frequency table which was addressed bythe previous value of the buffer is updated according to the newlyarrived bit. As it will soon become apparent, the emphasis is on therequirement that the previous value of the buffer should be associatedwith the bit arriving one step later, that is, the bit currently beingprocessed. It is precisely this feature that makes it possible to“predict” the value of the incoming bit during the operation of thesystem. The previous value of the buffer is stored in the variable“index” for which N is an upper limit.

In the next step of the method, the probe value (the value of the “prob”variable) is calculated for the next bit. The exact formula for thatshould be

${prob} = \frac{{{{FreqTbl}\lbrack{index}\rbrack} \cdot f}\; 0}{sum}$but this expression is not applied directly because the result would bea fraction. The calculation (the division) would require real arithmeticthat is too slow for our purposes.

Instead, a fraction table of 512 elements is used (represented by the“FracTbl[sum]” variable), of which the appropriate element is selectedby the sum of bit frequencies in the corresponding row of the frequencytable (cf. the “sum” variable in the above algorithm). To determine theprobe value, the f0 value in the appropriate row of the frequency tableis multiplied by the value retrieved above from the fraction table, andthen the product is multiplied with a constant, e.g. the product isshifted 10 bits to the right. Thus, the probe value is obtained, whichwill fall into the interval 0 . . . 65535, which is in turn analogouswith the interval 0 . . . 1. As the fraction table contains 512 elements(actually the appropriately scaled values of 1/sum, where the “sum”variable is used as an index to the table at the same time), it shouldbe made sure that the “sum” value does not exceed this value.

This is achieved by testing the “sum” value and by re-scaling f0 and f1if “sum” is greater than 256 (because in effect the value of the “prob”variable is determined by the proportion of f0 and f1 and not theirabsolute value, they can be re-scaled).

As it turns out from the algorithm, the method is fairly simple. Sincedivisors are powers of 2, divisions can be substituted by right-shiftoperations. The MOD operation can also be substituted by a logical AND.

VI.3. As it has already been mentioned, the method performs coding atdifferent levels. In practice this affects only the probe variable. Inother words, the same coding module can be used for all data types, onlythe parameters (N, window size, limits of the coding interval) should beadjusted. Different levels are assigned to each data type, with eachlevel being divided into a plurality of sub-levels according to therequirements of the given data type. For instance, for coding theresults of the 4×4 discrete cosine transform operation a specific typelevel is defined, with different sub-levels being assigned to the codingof AC and DC coefficients.

The constant N determines the window size, in other words the number ofpreviously received bits that are tested together with the bit currentlybeing received. This factor strongly affects coding efficiency, but alsoincreases the required memory, because more memory is needed if N isincreased.

The method presented above is a fast integer-arithmetic variety of knownmethods. With parameters tuned appropriately, compression efficiency is10% higher than that of the VLC method used in MPEG systems. So far,only methods using far more complex probe algorithms have performedsignificantly better than that. To improve efficiency, the frequencytable should also be significantly sized up. Both the chosen probealgorithm and frequency table size affect the execution time of themethod.

The best known method for determining the probe value for datastructures utilized in the present invention would be the so-calleddynamic Markov model. This, however, works efficiently with a frequencytable of at least 64 Kb. If all sub-levels applied for coding were setto this size (that is, if all variables and variable types correspondingto different block sizes were to be coded in this manner), more than 16Mb of memory should be allocated for the frequency tables only.

These problems have made it necessary to devise a new, significantlymore effective probe algorithm.

VI.4. Arithmetic Coder with Neural Probe

VI.4.1. An important drawback of arithmetic coding is that foroptimal-efficiency coding the occurrence probability of individualsymbols appearing in the data to be coded should be known. In principle,it would be possible to take into account multiple symbols that havealready occurred. It would be even more effective to watch not onlyindividual symbols, but occurrence frequencies of different symbolgroups. This, however would require the storing of a very high number offrequency values. This may be alleviated to some extent, by storing,instead of all the possible combinations of symbols which have alreadyoccurred (contexts), only those symbol combinations that have in factoccurred.

That means that an extra symbol (escape) should be introduced toindicate the occurrence of a new symbol combination.

Known coding methods, primarily the PPM (prediction by partial match)method examines symbol combinations of varying length. When a receivedsymbol is coded, first the longest allowed combination is tested. Thenewly arrived symbol is added to the stored symbol group, and a searchis performed with the current symbol group length to establish if thecurrent group has already occurred. For instance, if the group length is4, then the three most recent symbols will be tested together with thenewly arrived one. If the symbol combination has already occurred, it iscoded using the momentary or constant probability value assigned to thatgiven symbol combination. If, on the other hand, the combination has notyet occurred, an escape symbol is coded to indicate (for the decoder)that the combination is new, and the search is carried on with a shortercombination length.

In case the received symbol has not been coded in any previouscombination, it is coded using the average of occurrence probabilitiesassigned to individual symbols. After the coding has been completed, thecounters measuring the occurrence counts of symbol combinations (thatis, quantifying the occurrence probabilities thereof) are updated, withnew combinations added if necessary.

Since this method is slow and has relatively high memory load, it is notsuitable for coding video data directly.

However, according to the invention, the principle of examining varyinglength symbol combinations is carried over to the neural coding methodapplied for the present invention. It has to be noted that the idea ofapplying a neural network for determining the arithmetic probe value isnot new. A method utilizing a neural network was implemented in 1996 bySchmidhuber and Heil. Their method, in a manner similar to PPM, watchesthe co-occurrence of previously received symbols and the newly arrivedone, and determines the probe value accordingly. With this knownsolution it is not necessary to use different symbol group lengths forthe search, as only those inputs of the neural network will be activewhere there is correlation between the currently tested combination andone of those that were “taught” earlier. That way, selection(recognition) of such symbol combinations is performed automatically.This known method is, however, of little use for practical purposesbecause the training process is very long. For instance, in one testcase the training of the network to recognize correlations of approx. 20kB of data required two days.

Matthew V. Mahoney (Florida Institute of Technology) took over theapplication of neural network technology to binary arithmetic coding(where only 0 and 1 are the symbols to be coded), using the on-linetraining method known from neural network theory and applying adaptivelearning rate instead of a constant one.

However, even this improved method is not good enough to be directlyapplicable for video encoding, because the memory requirement necessaryfor optimal coding is too high. Mahoney's original method applied morethan 258 kB of memory for coding a single level. If that is applied at128 levels (for 128 different types of data structure, taking intoaccount the predictions, block sizes, DC and AC coefficients, etc.) thetotal memory needed would be more than 32 MB. If, however, only a singlelevel were defined for video encoding, the method would be lessefficient than the conventional coding method using multiple levels.

Having considered all these problems, an inventive method is devised fordetermining the neural probe value. The method according to theinvention can maximise entropy using as little as 1-2 kB of memory. Ithas been found that it is worth increasing the memory only up to approx.16 kB, above which the improvement in coding efficiency is negligible.

VI.4.2 The method is explained in detail on the following pages.Compared to the arithmetic method presented above, the only differenceis that the calculation of the probe value (the function returning thevalue of the “prob” variable) has been replaced by a neural probe.

Incoming bits to be coded are fed into a shift register (buffer) with asize of 32, 64 or even more bits (in the following, an examplecomprising a 32-bit register is presented). The contents of the registerconstitute the so-called window. Now, the value stored in the register(treated as a 32-bit integer) is dynamically divided into N parts usinga hash function.

Consider the following definition of the hash function (provided as anexample only):

-   adr1=register mod H0-   adr2=H0−(H0*k)+((register/64)mod H1)-   adr3=(H0+H1)−(H1*k)+((register/16384)mod H2)-   adr4=(H0+H1+H2)−(H2*k)+((register/4194304)mod H3)

H3 is computed from previous address values, so that the range lengthH3, starting from (H0+H1+H2)−(H2*k) extends to the end of remainingmemory (until 2048). The result of the hash function is shown in FIG.18.

The “register” variable is the binary value currently stored in theregister 150, and H0-H2 are predetermined values. For instance, thevalues of H0-H2 can be 256, 512, 1024, from which the value of H3 wascalculated as 1536 in a concrete situation. The factor k is given by theoutput of the neural network. The factor k has a default value and canfall into the interval between 0 and 1.

The hash function presented above maps addresses adr1-adr4 to rows oftable 155, where the table 155 consists of 2048 rows. More precisely,the mapping is to subranges 151-154, which means that address1 points toan element (table row) in subrange 151, address2 to a row in subrange152, address3 to a row in subrange 153, and address4 to a row insubrange 154. An important feature of the inventive solution is that thesubranges may overlap, thereby forming unions 156-158, this embodimentcomprises three unions 156-158. As it can be understood from the aboveexpressions, the sizes of the unions are determined by the factor k. Theimportance of unions increases from the least significant toward themost significant bits. The unions play an important role in therecognition of recurring bit sequences in the input stream. The role ofthe unions is also important because they make it possible that twoaddresses point to the same row of the table 155. Without going intomuch mathematical detail, suffice it to note here that, because theinput is processed sequentially, if the first address points to a givenneuron, then the weight thereof will change during the training process.In case another address selects the same neuron, the weight will changeagain together with the value of the bit frequency function assigned toit. Accordingly, the value of the bit frequency function associated tothe neuron will also be modified twice. During the operation of thesystem both inputs will have the same weights.

Because the factor k is assigned a value by the output of the neuralnetwork and the actual lengths H1-H4 of the subdomains 152-154 aredetermined by k (indirectly, through union sizes, because union sizesaffect the length of subdomains), the partitioning of table 155 intosubdomains 151-154 is changing dynamically according to the output ofthe neural network after each received bit.

The N addresses (in this case N=4) defined by the hash function select Nelements (rows) of the table 155. Each row of the table 155 contains twoweight functions—quantization weight function W_(i,Q) and the scaleweight function W_(i,s), (or, in case the scaling factor S need not begenerated, only W_(i,Q)) and a frequency pair f(0)_(i), f(1)_(i) definedin the same way as in the case of the discrete algorithm (that is,frequency values are updated depending on the bit value each time a bitis received). The selected weight functions are modified during thetraining process as follows:

$W_{i} = {W_{i} + ( {{error}*{gain}*{eta}*\frac{{f(0)_{i}} + {f(1)}_{i}}{{f(0)}_{i}*{f(1)}_{i}}} )}$where i is the index of the addressed row of the table, “error” is thedifference of the predicted and the actually received bit, “gain” is thegain factor, “eta” is the learning rate, and f(0), f(1) are the bitfrequencies defined above. In effect, the knowledge base of the N-inputneural network applied in the method is constituted by the weightfunctions and frequency values stored by table 155.

During operation, the output of the network according to this example isgiven by the formulaout_(l)=exp^(ΣW) ^(l)where i is the number of outputs, with the index i running from 1 to N,i. e. the output is summed for all the selected weight functions. Theoutput of the network can be defined so that the k and “prob” valuesthemselves appear at the output.

Again skipping the more detailed mathematical analysis, the operation ofthe neural network can be sketched as follows:

The probe value is 0 (corresponding to the probability 0.5) when thefirst bit arrives. Then the network calculates the error(error=bit-probe), and “teaches” the error to neurons assigned to theprevious bit value. (In the first step these values are irrelevant.Because there are no previous data, all addresses are 0 so the 0-thneuron will be assigned to the input). Next, the system generates newaddresses from the current value of the register (buffer). Weightfunctions (zero in the first step) of neurons selected by the addressesare then summed up and the exponential of the sum is calculated (theresult in the first step is zero as well), which becomes the new probevalue.

The next probe interval is −0.5 . . . +0.5, so the current probe valuemapped on the 0 . . . 1 interval will be 0.5. For the following incomingbit the above process is repeated, this time with valid addresses, withthe weight finctions of the neurons previously selected being modifiedon the basis of the error value. The process is the same for the factork.

As we have mentioned it already, the gain and the learning rate of thesystem can be dynamically adjusted. In a preferred embodiment the gainis modified only in case the address selecting the neurons points to anaddress range within a union.

That is, a gain factor is assigned to each union according to thepriority of the unions.

The learning rate is determined by the factor k and a number of externaldeterminants.

The learning rate specifies the slope of the learning curve for thenetwork, that is, the degree to which the error is minimized in eachtraining cycle.

The neural network has two outputs: the output “probe” gives thearithmetic probe value, while the other output specifies the factor kthat is applied for determining the partition of the unions. A possiblestructure of the neural network is illustrated in FIG. 18 b, showingneurons of the input layer 160, and outputs 169 and 168 yielding “prob”and the factor k. The neural network shown in the figure has a hiddenlayer 162 as well, but the inventive method also works successfully witha simpler-structure network.

In this manner, to each row of the table constituting the knowledge baseof the neural network there are assigned the frequencies f(0) and f(1)that can be regarded as counters. These counters, similar to elements ofthe table FreqTb1[index], seen above, specify how many times thecurrently coded bit has been 0 or 1 when the given table row wasselected by one of the addresses adress 1-4 pointing to subranges151-154 produced by the subdivision of table 155. Thus, frequencies f(0)and f(1) may be stored in a table of N rows, and, similarly to the waydescribed above, they should be re-scaled if their value exceeds a givenlimit.

VI.5. Next, the partitions of the register 150 are tested iteratively toselect the best partition. The neural network updates frequency dataf(0), f(1) of the most recently addressed table rows based on the valueof the next received bit, and “teaches” to the neuron weight functionsstored in these rows the last value of k and the probe factors derivedfrom frequencies f(0), f(1) with regard to the difference (error)between the predicted and received bit value.

The operation of the system is in many respects similar to methodsdescribed above, but the dynamic register partitioning and theutilization of dynamic gain and learning rate are fundamentally novelelements.

Let us see an illustration of how efficient the method is in practice:

We coded the DCT coefficients of 20 8×8 blocks, producing 1280 bytes ofdata. The coded blocks were adjacent in the frame, and were practicallyidentical. We were interested mainly in the coding efficiency of themethods in a situation where there were recurring data sequences of DCTcoefficients in the different blocks, but there were no recurringsequences within individual blocks themselves.

The results are compared to other methods in the following table:

Type Input data Output data VLC* 1280 690 AMT binary arithmetic model*1280 550 Arithmetic (Markov model 200K) 1280 350 Mathews neuralarithmetic 256K 1280 149 AMT neural arithmetic (1K) 1280 76*multiple-level method

With other data types the results show greater spread but our methodstill performs significantly better than other solutions.

VII. Bandwidth (Transfer Rate) Control and the Regulation of CompressionRatio

VII.1. Bandwidth (transfer rate) control is one of the most importantissues in video encoding. The information content of frames in a videoframe sequence varies to a great extent, so in case the aim is tomaintain a substantially even image quality, and the compression ratiohas to be adjusted over a large scale to follow these changes.

If the compressed data are stored on a storage device, the storagecapacity of the device will limit the total amount of coded data. Themost obvious case where the bandwidth is a constraint occurs, however,when data must be transferred in real time over a data transfer system.In that case the quality of the transferred video is limited by theavailable bandwidth (data transfer rate). Therefore, it is necessary tokeep the transfer rate (the amount of data transferred in one second) ata near constant value. This can be achieved only by regulating thecompression ratio. Hereafter, compression ratio regulation meansincreasing or decreasing the quantization factor. However, quantizationcannot be increased without consequences. Higher quantization causesimage details to disappear with the decrease of discernable spatialfrequencies present in the image, with the visually perceptible errorincreasing at the same time. At a certain point the image inevitablyfalls apart into blocks and other visually conspicuous image distortionsoccur.

Modifying the quantization factor in accordance with the availablebandwidth is known per se. In conventional solutions, the newquantization factor is calculated from the expected and actual length ofthe coded sequence and is applied on the next frame. Instead of takinginto account only one frame at a time, the more sophisticated solutionscalculate the new quantization factor using the average length and theexpected average length of frames compressed in a given timeframe. Thesemethods usually involve a reaction delay factor, specifying the time inwhich the control system should achieve the computed maximum value.

Such methods provide constant transfer rate or constant bit rate (CBRmethod).

Results can be significantly improved by setting a minimum and a maximumlimit for transfer rate control, always keeping the transfer rate withinthese limits and attempting to achieve in the long run a dominanttransfer rate equalling the mean value of the upper and lower limits.

It is preferable that the signal-to-noise ratio between the original andreconstructed frames be also taken into account as a control parameter,that is, the transfer rate should be increased (within the specificlnimts) in case the SNR deteriorates and the transfer rate may belowered if the SNR improves. This is the so-called variable bit rate(VBR) method. A major drawback of this solution is that the totalexpected data length cannot be predicted exactly. Minimum and maximumvalues cannot be set too high, because then the control range would alsobe too wide and the total coded data length would vary over a too largescale. It also often happens that the desired quality cannot bemaintained with the maximum transfer rate set by the system, making itnecessary to further increase the transfer rate.

VII.2. According to the present invention, two solutions are providedfor bandwidth,(transfer rate) control. Both methods are based on theapplication of a neural network.

VII.3.1. According to the first solution, the neural network has abackpropagation structure that is known per se. The network isillustrated in FIG. 18 c. The network has N inputs 180l-180 n, a hiddenlayer 185 containing 2N neurons, and at least one output. Thequantization factor Q and scaling factor S (the role of which isdescribed in detail below in section VIII.) appear at outputs 188 and189.

In order to provide continuous control, the input of the network is thelast N received data elements, which are fed sequentially to the Ninputs in their order. The data elements are constituted by the ratio ofexpected and coded frame lengths, and these data elements are consideredas a temporal sequence.

EL_(i−1)/CL_(i−1), EL_(i)/CL_(i), EL_(i+1),/CL_(i+1), etc. ( where EL:Expected Length, CL: Coded Length, and i is the index corresponding tothe i-th data (for instance, a coded frame)

Test data applied for training the network were produced using anexternal control system or manually prior to the beginning of thetraining process, with the data being reduced by complex correlationcalculations to 1000-4000 training samples. Training samples are devisedsuch that they represent every frame type occurring in an ordinary videorecording. These samples are “taught” to the system through severaltraining cycles, in a concrete implementation, 50000 cycles.

After training has been completed, the network is ready for receivingand processing real data. It should be noted that in this concreteembodiment the calculations have been established such that it was notthe quantization factor Q that appeared at the output of the network,but a k_(Q) coefficient, which was in turn used for modifying thecurrent Q.

The main advantage of neural networks compared to discrete systems isthat virtually all types of control characteristics can be implementedwith them.

Control characteristics are much more important in a variable bit ratecontrol system than in a constant bit rate control system.

VII.3.2. As we have already mentioned, the network operates withsurveyed training. The neural network estimates the desired mapping withprevious expected and coded frame length values, the latter beingregarded as a data sequence. This method can be applied successfully incase of constant bandwidth (CBR) but is not always suitable in itselffor variable bandwidth (VBR) systems. The reason for that is that VBRsystems also take into account image quality.

In case the image quality exceeds the average quality value, thecompression ratio is Increased, while if the quality is poorer than theaverage value, the compression ratio is decreased. The neural networkcontrol system must also take this into account. Therefore a minimum anda maximum quantization factor is specified for the control network,which may not be exceeded by the control system. These values takeeffect through the neurons designated with reference numerals 186 and187 in FIG. 18 c.

VII.3.3. The neural network applied for VBR mode has twice as manyinputs as the network used for CBR mode, because, in addition to theexpected/coded length ratio, data representing image quality are alsofed to the input of the network in the form of expected/coded qualityratios:

EQ_(i−1)/CQ_(i−1), EQ_(i)/CQ_(i), EQ_(i+1),/CQ_(i+1), etc.

where EQ is the expected quality, CQ is the coded quality, and i is thedata index.

As seen in FIG. 18 c, in addition to the output 188 determining thequantization factor, the neural network used in VBR mode may alsocomprise a ftrther output 189 representing the scaling factor S (therole of the latter is described later). Similarly to the above describedcase, the network processes the input data of expected/coded quality andexpected/coded length in a time sequence during training, and estimatesthe sought mapping in accordance with the specified minimum and maximumvalues. Training data are chosen to reflect the specified controlcharacteristics and control slope.

During the real-coding operation of the network, i. e. when real dataare processed, there is no further training, and the weight functions ofneurons remain constant. Coding itself is in effect an association task:the received input data contained in the time slot are mapped to the Qand S factors on the basis of what the network has “learned” during thetraining process. Thus, by, determining the next values of Q and Saccording to received length and quality data, the network performs thesame task as conventional control systems. Intuitively, the time slotcan be said to describe a particular situation. The network will searchamong situations it encountered during training for the one that bestmatches the current one, giving the optimal response learned inconjunction with that particular situation.

An implemented variant of the neural network substantially uses only asingle formula, the so-called sigmoid sum.

${out} = {1/{\exp( {1 + {{- {Gain}}*{\sum\limits_{i = 0}^{N}{W_{(i)}*{Input}_{(i)}}}}} )}}$where N is the number of neurons

The “gain” value may be unity and can be determined by optimum search asits only role is to determine the order of magnitude of the outputvalues. First, weight functions of neurons in the hidden layer(s) aresummed with the above expression, then the calculation is performed forthe weight functions of the output layers as well.

In their practical implementation, there is no significant differencebetween the VBR and CBR networks, except for the input data, which meansthat the network performing VBR mode can perform the functions of theCBR mode as well. For CBR-mode operation, that is achieved by simplyproviding a constant value at the quality inputs (at the maximumpossible value, which inputs are kept constant during training as well).In CBR mode the minimum and maximum bandwidth limit inputs are set equaland are kept constant, set to values corresponding to the desiredconstant bandwidth.

VII.4. In this section, another variant of the improved dynamicquantization performed by a neural network is described (the so-calledaddress decomposition method).

VII.4.1. This network model is a variety of the one presented in thesection dealing with arithmetic coding. It only differis from the abovenetwork in that the addressing of certain, selected neurons of thenetwork are determined not by the entire time window/time slot. This isillustrated schematically in FIG. 19, showing that input data examinedin the time window bypass the address generator. Here, the elements ofthe input data sequence are not 0-s and 1-s, and therefor the addressgeneration procedure described above cannot be applied directly.Instead, the input layer of the neural network consists of two parts. Toeach data value that can be found in the time window, a hash function(similar to the example illustrated above) assigns a neuron, selectedfrom an appropriate number of neurons, such as 2048 or 256, depending onwhether the incoming data is expected/coded length data orexpected/coded quality data. In effect, this means two neuron weighttables, one for neurons (more precisely, for the weight functionsthereof), working with expected/coded length data and another table forweight functions of neurons working with expected/coded quality data.

If the time window size is N=32, and address generation is performedusing 11 and 8 bits, respectively, for the two data types, the size ofthe memory needed to store the input data of neurons will beMem=2048*16+256*16 (16×11 bit-long normalized data for addressesgenerated from expected/coded length and 16×8 bit-long normalized datafor addresses generated from expected/coded quality data)

In the used address conversion procedure, the inputs of the neuralnetwork are normalized data with values between 0 and 1, which aresubsequently converted into integers for address generation.C_(vk)=VK_(n)*2¹¹C_(m)=M_(n)*2⁸  Equation IIIwhere VK_(n) is the normalized expected/coded length ratio and C_(vk) isthe generated address, and where M_(n) is the normalized quality andC_(m) is the generated address.

Based on these expressions, addresses are generated from data located ateach position of the time sequence. The addresses so generated thanaddress the neurons stored in the tables. In other words, the neuronsare selected by the generated addresses, and the neurons receive theratios of expected/coded length and expected/coded quality during thetraining process. The system has two other inputs that are notassociated with the time sequence. Similarly to the network shown inFIG. 18 c, these inputs are applied for determining the minimum andmaximum bandwidth. An error factor is calculated using the expression(1/Q-1/Q_(prev)) i.e. the error factor is determined as the differenceof the reciprocal of the current training quantization factor and thereciprocal of the previously calculated quantization factor, whereQ_(prev) is the previous quantization factor.

The-weight function is modified as follows:

$W = {W + {( {\frac{1}{Q} - \frac{1}{Q_{prev}}} )*{eta}*{input}}}$(where eta is the learning rate)

${out} = {\exp( {\sum\limits_{i = 0}^{N}W_{(i)}} )}$where N is the number of neurons

There is no further training during the effective control operation ofthe network, i. e. this system also uses a pre-trained network.

The process of the control operation is identical with the trainingprocess except that weight modification is not activised.

In a manner similar to what has been described above, the system can beswitched to CBR mode by fixing the min/max bandwidth and quality data atrespective constant values.

This system operates flawlessly in multiple-step mode as well. Theessence of this mode of operation is that in a first step, the systemencodes the entire footage with a constant quantization factor (e.g.with Q set to 3) without control. In the subsequent second step codingis performed with the control system activated. This solution providesimproved-precision coding because the first step specifies the degree towhich each frame can be compressed, so Q need not be determined, but maybe directly adapted from step 1. Otherwise, the inventive neural networkcan be applied without any modifications. In multiple-step mode trainingcan be performed using fast-training procedures. Also, interpolation ishighly effective in this mode: we have observed that the qualityachieved in 4-6 steps by discrete control systems can be reached by theneural control system in as little as two steps.

It should be noted that, compared to the single-step moded, themultiple-step mode does not necessarily provide better quality. One ofthe advantages of this mode is that the length of the output can beadjusted to a predetermined value, corresponding e.g. to the capacity ofa storage medium.

VIII. Dynamic Scaling

VIII.1. As it has already been indicated, it may often happen that thedesired length of the coded video cannot be maintained with the desiredvideo quality. This might be caused e g. by that the preset compressionratio is extremely high for particular sequences and so the desiredimage quality cannot be maintained using the currently set min/maxlimits of compression. Such a situation typically occurs with highlydetailed and action-rich footage. As an example, the first 5 minutes ofthe feature film “The Mummy Returns” may be mentioned. In case of thisparticular sequence, a bandwidth of at least 2.5 Mbit/s would be neededfor good-quality MPEG compression. However, if the available bandwidthdecreases to 1.5 Mbit/s, rather complex pre- and postfilteringoperations would be needed both at compression and decompression time inorder to eliminate errors. This would strongly decrease image sharpness,to the extent that the quality of the coded video would barely reach the“acceptable” level.

VIII.2. In order to eliminate. the problems described above, inaccordance with the present invention, the concept of dynamic scalinghas been introduced. This essentially means scaling down (re-scaling) ifthe control system is unable to maintain the desired image quality dueto fixed external boundary conditions. The frames are scaled down(re-sized) to a size that provides satisfactory results. The systemcompresses this reduced-size frame and, at decompression, restores it toits original size. Understandably, image quality deteriorates in thiscase as well, however, this will primarily appear as reduced sharpness.Blocking artefacts and other typical errors caused by the compression donot arise, at least if the compression ratio is not set extremely high.

We have examined what would happen if the amount of input data werereduced for such critical frame sequences, but the compression factorswere left unchanged. With other words, the frames were scaled down to ¾or ½ of their original size. Because the encoder strives to keep thedata rate constant as far as possible, taking into consideration theimage quality, reduction of frame sizes will result in a reduced degreeof compression. In this manner, the amount of output data remains thesame as before re-scaling.

For instance, if the entire data stream is coded with a bandwidth ofe.g. 0.5 Mbit/s; and frames are reduced to half their original size incritical sequences, the actual transfer rate remains 0.5 Mbit/s, butboth the compression ratio and the quantization factor are significantlyreduced. In case of the above example, the-latter factors were reducedto an extent that corresponded to an image quality coded with 2Mbit/s-bandwidth, in the original frame size. This entails a reductionof the errors. The drawback f the method is that scaling reduces theresolution of the framies. Thus, when frames are restored to theoriginal size during decoding, the values of missing pixels must beinferred. However, this problem can be significantly reduced by applyinga suitable scaling method. It must be taken into account that there arespatial frequency components in the frame, and the transformation mustbe performed accordingly.

VIII.3. Accordingly, in the dynamic scaling method according to theinvention, we need scaled images. A number of interpolation-based framescaling methods were tested. The Láncos method yielded the best results(the Láncos method is a resampling procedure known per se thatinterpolates the missing pixel by a filter, based on spatial frequencycomponents of the image). If compression with and without scaling arecompared, it turns out that without scaling, in critical sequences thequality loss can be easily perceived if the stream is compressed for atransfer rate of 0.5 Mbit/s. Many areas in the image become completely“flat”, blocking artefacts and stripes appear, with image sharpnessbeing drastically reduced in some areas as if an eraser was applied tothe image. On the other hand, in case the compression is performed withthe frame scaling according to the invehtion, none of these errorsoccur. The only perceptible error is the reduction of sharpness.However, having analyzed the sequences, it was found that scaling istypically needed at those points where fast motions occur in the videofootage. Because fast-moving scenes are usually slightly blurred in theoriginal already, the information loss caused by re-scaling is barelyperceptible.

The inventive dynamic scaling procedure is performed as follows:

Each incoming frame passes through the scaling module, with a scalingfactor of 0 (no scaling) at the beginning. The compression controlsystem decides if the result of coding is satisfactory within thespecified limits. If the result is not satisfactory, the frame size ischanged (reduced) to a degree that provides acceptable qualityeven-atter decoding.

It should be noted that the scaling task can be solved, utilizingdiscrete methods. But, considering that a neural network has alreadybeen applied in the inventive system for bandwidth control, the scalingmay be also performed more efficiently, with a neural network. As theproblem is closely related to the problem of bandwidth control it hasproved to be satisfactory to add another output to the neural network(see FIG. 18 c). Thus, the network has two outputs, one providing the Qcoefficient of the quantization factor, the other the scaling factor S.

In the first solution provided for bandwidth control (see section VII.3,and FIG. 18 c a new neuron 189 may be directly inserted, its outputproviding the scaling factor S. However, for the second proposedsolution (see section VII.4.) in practice two weight vectors should beassigned to each table address.

This practically corresponds to two independent neural networks, havingidentical inputs but sending different values to the outputs. To renderthe network more sophisticated, it is proposed to add a hidden layer tothe neural network, with the neurons thereof connected to the outputlayer. In that case the network will have a so-called backpropagationstructure. Here again, as in the previous solution, neurons of the inputlayer are selected by the positions pointed to by addresses generated bythe system.

The scaling procedure starts at an I- (intra) frame and lasts until thenext I-frame. Scaling factors determined for frames of other types areaveraged. This is shown in FIGS. 20, 21 where scaling is started at Aand ended at the point marked B.

X. Some General Remarks Concerning the Neural Control System Applied forthe Present Invention

We have tested a number of different network types for potentialapplication with the inventive control system. Taking into account theircomputational load and proper control behaviour, surprisingly the bestresults were produced by the networks with the simplest structure. Theso-called counterpropagation-type networks may also give excellentresults, providing in many cases better approximation than thebackpropagation-type network described above, but only if theaddress-decomposition method (see above) is applied Summing up, thesecond method.performed better than the first method, due to the factthat it uses much more neurons than its counterpart, which provides alarger knowledge base. Converted to backpropagation orcounterpropagation, it provides excellent control.

The invention is essentially based on the idea of providing a controlsystem that is capable of realizing different control characteristics,and ensuring optimal compression while also taking into account thevisually perceived image quality. Different learned characteristics canbe grouped into profiles that enable the selection of thecharacteristics most appropriate for any given video sequence. We havealso tested discrete control systems and have found that they haveinadequate control dynamics. For instance, if coding was carried out intwo steps using medium bandwidth, there could always be found sequencesthat would have needed higher compression, or scenes where it would havebeen satisfactory to use lower compression. Known discrete codingsystems are closed systems, meaning that they usually perform encodingusing constant fuictions. The neural system is, however, capable oftaking into account the information of previous frames and performingcoding control using the learned momentary characteristics.

Because different neural network models are known per se, the operationof the inventive neural networks has not been analyzed in detail in thisdocument. We have only provided a number of concrete implementations ofneural networks adapted for coding video frame sequences.

XI.1. A Summary of the Operation of the Hybrid Video Coding SystemImplementing the Inventive Methods

The structure of the entire system is depicted in FIG. 8 and FIGS.22-24. During compression frames first pass through the scaling system(resampler module) 61 (FIG. 8). The system decides with which method,the coding should continue, and selects the coding mode (I, P, B)accordingly. In case of an I-frame the different predictions are carriedout for each block, and the system, based on the result of varianceanalysis, selects the prediction promising the best compressibility.Then, blocks are transformed with DCT, are quantized and compressed atthe appropriate level.

In case of a P frame only the preceding frame, while for a B frame boththe preceding.:and the subsequent frames are used as reference frame forsearching a matching reference block for the current block to be coded.The found block is then compensated in accordance with-the block sizeand position (it is subtracted from the reference block, optionallyusing ½ or ¼ pixel resolution search and motion compensation), then theblock is predicted, transformed and coded. At the same time, the foundreference positions are converted into motion vectors, and the motionvectors are subtracted from previous ones, and compressed at theappropriate level. The compression ratio is regulated by the controlsystem in accordance with expected and coded length and quality values.If the desired quality cannot be maintained within the current limits,the frame is scaled down to a size at which the quality becomesacceptable. It has to be noted here that in the concrete implementationthe system never reduced frames to smaller than half of their originalsize.

Surprisingly, it was found that the implemented method performed farbetter than expected. Above the bandwidth of 450 kB/s there are hardlyany visually perceptible errors in critical sequences, apart from areduction in sharpness caused by re-scaling. It has been found that witha transfer rate in the 380-450 kB/s range the inventive hybrid codingsystem provides the quality of the SP mode of an average video recorder,while in the range of 280-380 kB/s the quality corresponds to the LPmode of a common video recorder. In case the bandwidth exceeds 500 kB/s,video quality approaches DVD quality. Above the 750 kB/s limit it ispractically visually indistinguishable from DVD.

A drawback of the inventive coding system is that, due to arithmeticcoding, it is sensitive to errors caused by data loss in thetransmission channel. However, contemporary digital transmissionnetworks (such as the Internet) are capable of high-security andsubstantially loss-free data transfer, even for very high amounts ofdata so this drawback is not significant. For operation of the codingsystem with good efficiency, the frequency table should be updatedcontinuously. If a transmission error occurs somewhere during thedecoding process, then from that point on all data until the end of theaffected frame will be damaged.

XI.2.1. The operation of the inventive hybrid video decoder applied fordecoding data compressed with the system according to the invention isexplained with reference to FIG. 22. Frame reconstruction starts byfeeding encoded data into input buffer 121 and decoding the streaminformation block 133., The stream information block 133 contains theoriginal size of the frame and other data that do not change in thecourse of decoding. Thus, the stream information block 133 is decodedonly once, at the beginning of the decoding process. Next, frame headerinformation is decoded (step 122). The frame header information blockcontains the current frame size, the frame type (I, P, B), quantizationtype, and other data pertaining exclusively to the given frame.

If the frame is an intra frame, the QuadTree structure describing blockpartitioning is decoded (step 123) together with the DCT coefficientsand specific information pertaining to individual blocks (step 126).Next, the inverse transformations are carried out (steps 127, 128, 129)on each block, the resulting inverse transformed blocks being writteninto current video memory 131 storing the new frame.

In intra frames each block contains all data needed for itsreconstruction (particularly the prediction type and informationindicating if the block has been partitioned as a 16×16 block or as four4×4 blocks, etc.).

In case of an inter frame, first the Quad-tree structure describingblock partitioning is decoded at step 123, because this tree structurecontains the data needed for the reconstruction of the block. These dataare used for decoding DCT coefficients, motion vectors, and predictioncodes associated to individual sub-blocks, and also for the decoding ofcodes identifying the reference frames that were used for coding.Inverse transformations are also carried out (steps 127, 128, 129), andthen those blocks of the reference frame stored in reference memory 125,which blocks were selected using the motion vectors in step 124, areadded to the inverse transformed blocks in step 130.

If the frame were coded using linear interpolation, then first theinterpolated block is generated on the basis of the block selected bythe reconstructed motion vectors in step 124 and the reference framesstored in the reference memory 125, and this interpolated block is thenadded to the inverse transformed block. Each reconstructed block iswritten into the current video memory 131 storing the new frame.

Both intra and inter decoded reconstructed frames are written into thereference memory 125. The reference memory 125 may contain more than oneframe, depending on the furthest reference frame used during the codingprocess. The reference memory 125 is a circular buffer, meaning that theoldest frame is deleted each time a new frame is stored.

The next step is restoring the frame size to the original in step 132.The frame size is restored preferably with the Láncos method. Scaling iscarried out both during coding and decoding by a suitable subroutine. Incase there is available a hardware video source or output device whichis capable of scaling, the coder or decoder needs to specify only theframe size.

XI.2.2. Neural Decoder

The neural arithmetic decoder operates in substantially the same way asthe neural arithmetic coding module, since, as it is known per se, inarithmetic coding the operation of the coder is the same as that of thedecoder. Because the method is adaptive, a single bit is decoded at thebeginning, and the new predictor is computed using the decoded bit. Forcomputing the predictor, the neural network used for coding can be usedwithout any alteration. The difference between the coder/decodermanifests itself only in differences in mathematical calculations thatare known per se, with other functional elements being fully identical.

Finally, a complete video coding/transcoding system is presented (seeFIG. 23).

The inventive video coding system is capable of digitizing, efficientlycoding and storing video signals. At the same time, it is also capableof transcoding already encoded digital video data for increased storageefficiency. For instance, such transcoding can be applied for reducingthe bandwidth of MPEG transport packets of a DVB broadcast from approx20 Mbit/s to approx. 600 Kbit/s, e.g. for recording satellite ortelevision broadcasts. In a similar manner, the inventivehigh-efficiency coding method can also be used for storing, videosequences recorded with digital video cameras, even without theapplication of mechanical devices.

Inputs of the coding system are constituted by analogue video input 93,combined decoded MPEG digital video/audio packet input 94, and analogueaudio input 105.

The coding system can be operated in the -following modes:

-   a, coding the signals coming from analogue video input 93 and    analogue audio input 105 after digital conversion.-   b, transcoding the digital video signal 94 and the audio signal,    which latter is separated by demultiplexer 109 from the combined    digital video/audio signal packet.

Digital video data selected by selector 96 are fed through input 97 tothe coding system 98 (explained above in relation to FIG. 8). Codedvideo data 99 are multiplexed with digital audio data into a combinedpacket 101 by multiplexer 100. The digital packets, being routed by thePSC (Peripheral System Controller) 102, can be stored on hard disk 103,on an optical storage device or in semiconductor memory 104. The digitalaudio signal that was selected by selector 107 is coded by encoder 108and is stored as explained above.

XI.2.3. Decoding of the stored video and audio data is illustrated inFIG. 24.

Demultiplexer 110 separates the data packet stored in semiconductormemory 104 or on hard disk 103 into coded digital video data 111 andcoded digital audio data 112. The digital video data 111 are decoded bythe decoding system 113 that was described above referring to. FIG. 22.Decoded video data 114 are optionally fed into filtering and scalingmodule 115, and then converted into an analogue video signal 117 by aD/A converter 116. Digital audio data 112 separated by demultiplexer 110are decoded by decoder 118, and are finally converted back into ananalogue audio signal 120 by D/A converter 119.

The invention is not restricted to the embodiments presented above, butother variations and implementations can also be conceived.

1. Method implemented on a computer having a processor and a memorycoupled to said processor for compressing a digitally coded video framesequence, comprising the steps of a, dividing a given frame into blocks,b, optionally, further dividing individual blocks into smaller blocks,c, modifying the information content of selected blocks relying oninformation contained in a neighbouring block or blocks, d, generatingtransformed blocks by carrying out on the selected blocks atransformation (DCT) that-converts spatial representation into frequencyrepresentation, and finally e, encoding the information content of thetransformed blocks by entropy coding, characterised by that i,compressibility analysis is performed on said selected blocks beforecarrying out the transformation specified in step d, and, depending onthe result of the analysis, ii, steps c, and d, are carried out on theblock or iii, optionally, the block is further partitioned intosub-blocks, and the compressibility analysis specified in step i, isperformed again on the blocks resulting from individual partitioning,and iv, the block partitioning that will potentially yield the bestresults is chosen relying on results given by steps i and iii, andfinally v, the transformation specified in step d, is carried out usingthe block partitioning with the best potential results, relying on theprediction specified in step c, wherein at least some of steps a throughe are performed using said processor.
 2. The method according to claim1, characterised by that the compressibility analysis of blocksbelonging to individual block partitionings is performed taking intoaccount the content of the blocks and/or the frequency of occurrence ofindividual block types.
 3. The method according to claim 1,characterised by that the contents of the blocks are subjected tovariance analysis either directly or by way of a Hadamard filter duringthe compressibility analysis.
 4. The method according to claim 3,characterised by that the variance analysis is carried out using thefollowing formula:${variance} = \frac{{\sum\limits_{j = 0}^{M}{pixel}_{j}^{2}} - ( {\sum\limits_{j = 0}^{M}{pixel}_{j}} )^{2}}{M}$where M is the number of elements in the given block or sub-block andpixel (i) is an element of the uncompressed block, with the computedvariance value being compared with a given threshold value to establishif the variance exceeds said given threshold value.
 5. The methodaccording to claim 1, characterised by further encoding with the entropycoding specific data that are assigned to blocks with the maximumallowed block size in a given frame, the specific data representing theblock partitioning of the block they are assigned to (Quadtree).
 6. Themethod according to claim 1, characterised by that discrete cosinetransform (DCT) is applied as the transformation that converts therepresentation in the spatial domain into a representation in thefrequency domain.
 7. The method according to claim 6, characterised bythat DCT is applied on blocks smaller than 16×16, and a Hadamardtransform is applied on blocks with a size of 16×16 pixels.
 8. Themethod according to claim 1, characterised by that the informationcontent of the modified (predicted) blocks is quantified during thecompressibility analysis with the following formula:${sum}_{(l)} = {\sum\limits_{i = 0}^{M}{{abs}( {pixel}_{(l)} )}^{2}}$where M is the number of elements in the predicted block, and pixel (i)is an element of the predicted block, with the computed “sum” valuebeing compared with a given threshold value or against a former “sum”value to establish if the computed “sum” value exceeds said giventhreshold value or said former “sum” value.
 9. The method according toclaim 8, characterised by that during the prediction of individualblocks prediction is carried out using multiple prediction modes, withthe prediction mode yielding the lowest“sum” value being applied on thegiven block.
 10. The method according to claim 1, characterised by thatin case the occurrence count of individual block sizes establishes thatthe frequency of occurrence of the two most frequently occurring blocksizes exceeds a given value, all blocks are replaced with blocks of thetwo most frequently occurring block sizes.
 11. The method according toclaim 1, characterised by that an error is computed during thecompressibility analysis of blocks, with the blocks contributing to theerror above a threshold value being divided into further sub-blocks,taking into account the computed error.
 12. The method according toclaim 11, characterised by that if the error exceeds a predeterminedvalue in case of a given sub-block, that sub-block is divided intofurther smaller sub-blocks and the compressibility analysis is performedon the resulting block partitioning which includes the smallersub-blocks.
 13. The method according to claim 1, characterised by thatblocks and sub-blocks of sizes of 16×16, 8×8, 4×4 or 2×2 are used. 14.Method implemented on a computer having a processor and a memory coupledto said processor for compressing a digitally coded video framesequence, comprising the steps of a, dividing a given frame intotwo-dimensional blocks, b, establishing a block partitioning of theframe, in specific cases by dividing individual blocks into furthersub-blocks, c, carrying out on the information content of blocks atransformation (DCT) that converts spatial representation into frequencyrepresentation, producing thereby transformed multiple-elementtwo-dimensional blocks and d, modifying the elements of the transformedblocks according to external boundary conditions, and finally e,encoding the information contained in transformed blocks by entropycoding, characterised by that in step d, the modification of the data intransformed multiple-element-two-dimensional blocks is modifieddepending on the size of the blocks and on the bandwidth available fortransmitting coded data, and wherein at least some of steps a through eare performed using said processor.
 15. The method according to claim14, characterised by that the modification of transformed blocks is aquantization.
 16. The method according to claim 15, characterised bythat the quantization is an MPEG quantization, according to thefollowing function:${qcoeff}_{(j)} = {( {\frac{( {{data}_{(j)}*16} ) + ( {{matrix}_{(j)}*0.5} )}{{matrix}_{(j)}}*( {\frac{2^{17}}{{QP}*2} +} )} )2^{17}}$where qcoeff (j) is an element of the transformed multiple-elementtwo-dimensional block, matrix (j) is an element of a matrixcorresponding in size to the transformed multiple-elementtwo-dimensional block, QP is the quantization factor.
 17. The methodaccording to claim 16, characterised by that values of matrix (j) aretaken from an empirically established matrix table, where individualelements of the table are entire matrix (j) matrices, with selectionfrom said table being performed according to the external boundarycondition specified in step d.
 18. The method according to claim 17,characterised by that selection from the table is performed with respectto the value of the QP quantization factor.
 19. The method according toclaim 16, characterised by that the entire QP domain is divided into Nsubdomains with matrix tables being assigned to individual subdomains,where the size of said matrix tables corresponds to the block size, witheach subdomain being assigned to a previously specified bandwidth range.20. The method according to claim 16, characterised by that the externalboundary condition of step d, is placed by available storage capacityand/or the available bandwidth.
 21. The method according to claim 14,characterised by that in specific cases the information content of theselected blocks is modified prior to the transformation carried out instep c, on the basis of the information contained in previously selectedimage elements of a neighbouring block or blocks or the informationcontent of a reference block included in a reference frame.
 22. Themethod according to claim 14, characterised by that for encoding intraframes steps of the method according to method for compressing adigitally coded video frame sequence, comprising the steps of a,dividing a given frame into blocks, b, optionally, further dividingindividual blocks into smaller blocks, c, modifying the informationcontent of selected blocks relying on information contained in aneighbouring block or blocks, d, generating transformed blocks bycarrying out on the selected blocks a transformation (DCT) that-convertsspatial representation into frequency representation, and finally e,encoding the information content of the transformed blocks by entropycoding, characterised by that i, compressibility analysis is performedon said selected blocks before carrying out the transformation specifiedin step d, and, depending on the result of the analysis ii, steps c, andd, are carried out on the block or iii, optionally, the block is furtherpartitioned into sub-blocks, and the compressibility analysis specifiedin step i, is performed again on the blocks resulting from individualpartitioning, and iv, the block partitioning that will potentially yieldthe best results is chosen relying on results given by steps i and iii,and finally v, the transformation specified in step d, is carried outusing the block partitioning with the best potential results, relying onthe prediction specified in step c are also carried out.
 23. Methodimplemented on a computer having a processor and a memory coupled tosaid processor for compressing a digitally coded video frame sequence,where the information content of certain frames is encoded from thecontents of the preceding or subsequent frames (reference frames), themethod further comprising the steps of a, dividing the frame to beencoded into blocks, b, searching a matching reference block for thegiven block to be encoded in the reference image preceding or followingthe frame containing said block to be encoded, c, carrying out acompressibility analysis by comparing matched reference blocks and theblock to be encoded, d, selecting the best reference block relying onthe result of the compressibility analysis, and e, encoding said blockusing the best reference block just selected, characterised by that instep b, during the search for reference blocks: i) the block to beencoded is divided into sub-blocks, ii) the contents of the sub-blocksare analysed, iii) according to pre-defined criteria, a predeterminednumber of sub-blocks, preferably at least two, are selected, iv) areference block search is performed using the selected sub-blocks, saidsearch being performed in a specific search range in the selectedreference frame for the reference block containing sub-blocks thatdiffer the least from the selected sub-blocks, with the relativeposition of the selected blocks kept constant during said search, and v)the best reference block is chosen as a result of a search using theselected sub-blocks, wherein at least some of steps a through e areperformed using said processor.
 24. The method according to claim 23,characterised by that in step v) the best reference block is chosen insuch a way that every time the search finds a block that is better thanthe current reference block, position data of the newly found block arewritten into a multiple-element circular buffer, with the last elementof the buffer containing the position of the best sub-block.
 25. Themethod according to claim 23, characterised by that a reference searchis carried out using the entire block to be coded, and the search beingperformed in the vicinity of the reference block that is considered asthe best reference block, and the final reference block is chosenaccording to the result of said search performed using the entire blockto be coded.
 26. The method according to claim 23, characterised bydetermining the absolute square difference of the block to be coded andthe reference block, and deciding about the acceptability of thereference block on the basis of the determined difference.
 27. Themethod according to claim 23, characterised by that the reference blocksearch is performed in a filtered reference frame.
 28. The methodaccording to claim 23, wherein if the results are still notsatisfactory, reference block search is carried out in search rangeslocated in further reference frames.
 29. The method according to claim23, characterised by that in case the search is unsuccessful in allreference frames, the block to be coded is divided into sub-blocks, witha matching reference sub-block being searched for each sub-block, saidsearch being performed in the vicinity of the reference frame positionsthat are considered the best, in that reference frame which has so faryielded the best results.
 30. The method according to claim 29,characterised by that in case dividing the block to be coded intosub-blocks has not produced satisfactory results, the search forreference sub-blocks is carried on in the vicinity of the best positionsof other reference frames.
 31. The method according to claim 29,characterised by that in case a sub-block remained erroneous, theerroneous sub-block is further divided into smaller sub-blocks, and thesearch is repeated.
 32. The method according to claim 23, characterisedby that the block to be coded is subtracted from the reference block,and the difference block is encoded in step e.
 33. The method accordingto claim 23, characterised by carrying out on the information content ofthe difference block a transformation (DCT or Hadamard transform) thatconverts spatial representation into frequency representation, producingthereby transformed multiple-element two-dimensional blocks (matrices ofDCT or Hadamard coefficients), and encoding the information content ofthe transformed blocks by entropy coding.
 34. The method according toclaim 23, characterised by that steps of the method for compressing adigitally coded video frame sequence, comprising the steps of a,dividing a given frame into blocks, b, optionally, further dividingindividual blocks into smaller blocks, c, modifying the informationcontent of selected blocks relying on information contained in aneighbouring block or blocks, d, generating transformed blocks bycarrying out on the selected blocks a transformation (DCT) that-convertsspatial representation into frequency representation, and finally e,encoding the information content of the transformed blocks by entropycoding, characterised by that i, compressibility analysis is performedon said selected blocks before carrying out the transformation specifiedin step d, and, depending on the result of the analysis ii, steps c, andd, are carried out on the block or iii, optionally, the block is furtherpartitioned into sub-blocks, and the compressibility analysis specifiedin step i, is performed again on the blocks resulting from individualpartitioning, and iv, the block partitioning that will potentially yieldthe best results is chosen relying on results given by steps i and iii,and finally v, the transformation specified in step d, is carried outusing the block partitioning with the best potential results, relying onthe prediction specified in step c are also carried out during theprocess of encoding.
 35. Method implemented on a computer having aprocessor and a memory coupled to said processor for compressing adigitally coded video frame sequence, comprising the steps of a,dividing each frame into blocks that are to be separately coded, b,carrying out on the information content of the blocks a transformation(DCT) that converts spatial representation into frequencyrepresentation, producing thereby transformed blocks, and finally c,encoding the information contained in transformed blocks by entropycoding, and applying arithmetic coding as entropy coding, during which abit sequence is encoded by modifying the lower and upper limit of aninterval as a function of values of consecutive bits of the bitsequence, and the distribution of the already arrived bits of thesequence is taken into account in the function that modifies the limitsof said interval, characterised by that addresses are generated fromalready arrived bit values of the bit sequence, said addresses areapplied for addressing individual processing elements of a neuralnetwork comprising multiple processing elements, and parameters of theprocessing element are modified such that the frequency of individualaddressing operations and the value of the currently arriving bit of thebit sequence are used as input data, and the output of the neuralnetwork is applied for determining a parameter that modifies the loweror upper limit the interval, after an initial learning phase involvingthe processing of multiple bits, the upper or lower limits of theinterval being determined during the encoding of incoming bits as afunction of the output of the neural network, wherein at least some ofsteps a through c are performed using said processor.
 36. The methodaccording to claim 35, characterised by that the incoming bit sequenceto be encoded is fed into a buffer, and divided into multiple shorterbit sequences.
 37. The method according to claim 36, characterised bythat the binary value represented by the bits of the shorter bitsequences is regarded as an address.
 38. The method according to claim35, characterised by the addresses are being used for selecting rows ofa table, where said table contains function values representing thefrequencies of occurrence of the possible values of the current bit tobe coded, as well as at least one weight function.
 39. The methodaccording to claim 38, characterised by that the weight functions ofindividual neurons are modified as a function of the function valuesrepresenting the occurrence frequencies of the potential values of thebit to be coded.
 40. The method according to claim 35, characterised bythat potential address ranges of the addresses form unions with oneanother are at least partially overlapping.
 41. The method according toclaim 35, characterised by that the gain and the learning rate of theneural network are dynamically adjusted according to the boundaryconditions.
 42. The method according to claim 35, characterised by thatthe encoder is used with different levels, where parameters of eachlevel can be adjusted separately, with a neural network operating withdedicated parameters being assigned to each level.
 43. The methodaccording to claim 35, characterised by that steps of the methodaccording to for compressing a digitally coded video frame sequence,comprising the steps of a, dividing a given frame into blocks, b,optionally, further dividing individual blocks into smaller blocks, c,modifying the information content of selected blocks relying oninformation contained in a neighbouring block or blocks, d, generatingtransformed blocks by carrying out on the selected blocks atransformation (DCT) that-converts spatial representation into frequencyrepresentation, and finally e, encoding the information content of thetransformed blocks by entropy coding, characterised by that i,compressibility analysis is performed on said selected blocks beforecarrying out the transformation specified in step d, and, depending onthe result of the analysis ii, steps c, (prediction) and d, are carriedout on the block or iii, optionally, the block is further partitionedinto sub-blocks, and the compressibility analysis specified in step i,is performed again on the blocks resulting from individual partitioning,and iv, the block partitioning that will potentially yield the bestresults is chosen relying on results given by steps i and iii, andfinally v, the transformation specified in step d, is carried out usingthe block partitioning with the best potential results, relying on theprediction specified in step c are also carried out during the processof encoding.
 44. Method implemented on a computer having a processor anda memory coupled to said processor for compressing a digitally codedvideo frame sequence, comprising the steps of a, dividing a given frameinto two-dimensional blocks, b, carrying out on the information contentof blocks a transformation (DCT) that converts spatial representationinto frequency representation, producing thereby transformedmultiple-element two-dimensional blocks and c, modifying the elements ofthe transformed blocks according to external boundary conditions, andfinally d, encoding the information contained in transformed blocks byentropy coding, characterised by that modification of the data of thetransformed multiple-element two-dimensional blocks is carried out instep c, as a function of the output of a neural network, wherein atleast some of steps a through d are performed using said processor. 45.The method according to claim 44, characterised by that the neuralnetwork has back propagation or counter propagation structure, or is asimple network composed of multiple neurons, where, normalized values ofexpected/coded length and, expected/coded quality are used as inputdata, a specific number of previously received input data and thecurrent input data are stored in a time window (time slot), with thedata contained in the time window being assigned to the input neurons ofthe neural network.
 46. The method according to claim 45, characterisedby that the number of neurons in the input layer of the network equalsthe number of data elements stored in the time window.
 47. The methodaccording to claim 46, characterised by that the network comprises ahidden layer.
 48. The method according to claim 47, characterised bythat the number of neurons in the hidden layer is larger than the numberof neurons in the input layer.
 49. The method according to claim 44,characterised by that normalized expected/coded length values andexpected/coded quality values are applied as input data, a predeterminednumber (N) of previously received input data elements (preferably N=31or N=63) are stored in a time window together with the current inputdata, and generating addresses based on the data contained in the timeslot, input data are rounded off to a given bit length for the addressgeneration process, an address is generated from each element of thetime window by means of a hash function, said address pointing to anelement of a table corresponding to one of the processing elements ofthe network.
 50. The method according to claim 49, characterised by thatthe neural network is pre-trained utilizing the expected/coded lengthand quality data in their original form, before they are rounded off foraddress generation.
 51. The method according to claim 49, characterisedby that addresses are generated by means of a hash function from thedata contained in the time window.
 52. The method according to claim 44,characterised by that minimum and maximum allowed bandwidth values areapplied as input data for two processing elements that are independentfrom the rest of the neural network.
 53. The method according to claim52, characterised by that results generated by the processing elementsof the network and the two independent processing elements appear at twooutputs.
 54. The method according to claim 52, characterised by that theoutput of the neural network is a frame size scaling factor and/or aquantization factor.
 55. The method according to claim 52, characterisedby that the output of the neural network is a frame size scaling factorand/or a quantization factor.
 56. The method according to claim 44,characterised by that steps of the method according to for compressing adigitally coded video frame sequence, comprising the steps of a,dividing a given frame into blocks, b, optionally, further dividingindividual blocks into smaller blocks, c, modifying the informationcontent of selected blocks relying on information contained in aneighbouring block or blocks, d, generating transformed blocks bycarrying out on the selected blocks a transformation (DCT) that-convertsspatial representation into frequency representation, and finally e,encoding the information content of the transformed blocks by entropycoding, characterised by that i, compressibility analysis is performedon said selected blocks before carrying out the transformation specifiedin step d, and, depending on the result of the analysis ii, steps c,(prediction) and d, are carried out on the block or iii, optionally, theblock is further partitioned into sub-blocks, and the compressibilityanalysis specified in step i, is performed again on the blocks resultingfrom individual partitioning, and iv, the block partitioning that willpotentially yield the best results is chosen relying on results given bysteps i and iii, and finally v, the transformation specified in step d,is carried out using the block partitioning with the best potentialresults, relying on the prediction specified in step c are also carriedout during the coding process.
 57. Apparatus for encoding digital videodata, characterised by that comprising a unit adapted for performing thesteps of the method according to claim
 1. 58. A non-transitorycomputer-readable medium storing a program containing instructionswhich, when executed by at least one processor, causes the processor toperform the steps of the method of claim
 1. 59. A transmitter,comprising a processor operable to generate a coded sequence byperforming the compression method according to claim
 1. 60. Method fordecompressing encoded video data from a coded sequence produced by thecompression method according to claim 1.